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A  Review  of  Maximum  Entropy  Spectral 
Analysis  and  Applications  to  Fourier  Spectroscopy 


l.  INTRODUCTION 

This  report  is  based  on  a  talk  that  was  presented  on  April  6,  1984,  at  a  sem¬ 
inar  called  "MESA  Workshop",  sponsored  by  the  Optical  Physics  Division.  That 
seminar  was  organized  because  it  has  become  apparent  that  the  maximum  entropy 
technique  of  spectral  analysis  can  sometimes  help  in  the  resolution  of  spectra  in 
Fourier  Spectroscopy. 

The  importance  of  the  Meiximum  Entropy  Method  (MEM)  comes  from  the  fact 
that  the  resulting  power  density  spectra  (PSD's)  often  exhibit  "super-resolution": 
a  resolution  that  far  exceeds  that  of  conventional  methods.  This,  in  turn,  allows 
one  to  obtain  high  resolution  PSD's  with  much  less  data. 

This  report  provides  background  information  on  this  subject,  presents  some  of 
the  salient  features  of  the  technique,  and  illustrates  its  use  in  Fourier  Spectroscopy 
with  an  experimental  example.  The  report  also  reveals  some  of  the  peculiarities 
of  the  method,  and  shows  how  to  distinguish  which  types  of  data  represent  good 
candidates  and  which  represent  bad  candidates  for  its  use.  (There  are,  it  turns 
out.  cases  where  MEM  is  worse  than  conventional  methods.)  Finally,  the  ref¬ 
erences  are  annotated  to  guide  the  reader  to  the  most  useful  literature.  The 
latter  contain  the  details  that  are  left  out  here  due  to  space  limitations. 
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As  can  be  seen,  these  two  discrete  time  formulations  of  the  power  spectral 
density  appear  very  different.  To  someone  familiar  only  with  the  second  form, 
the  first  form  (MKM)  will  look  strange.  To  give  the  MKM  formalism  a  clear 
physical  meaning,  this  report  will  first  present  the  development  of  the  power 
spectrum  up  to  the  time  of  Burg.  We  then  consider  Jaynes'  development  of  the 
main  principle  behind  MEM.  This  principle  is  explained  in  its  general  form  and 
then  used  to  derive  the  MEM  spectrum.  After  that,  we  consider  a  more  detailed 
approach  that  uses  the  ”  Z-transform. "  This  transform  is  explained  in  Section  4, 
where  it  is  shown  that  the  Lorentzian  line  shape  is  due  to  the  nature  of  the  Z- 
transform,  rather  than  depending  directly  on  the  MEM  principle.  In  Section  5, 
the  MEM  spectrum  is  derived  from  autoregression  theory,  which  gives  the 
physical  significance  of  the  MEM  spectrum. 

The  associated  matrix  equation  is  derived  in  Section  G,  to  give  a  physically 
clear  picture  of  it.  The  practical  methods  to  solve  it  are  derived  in  Section  7. 
.Another  approach  is  through  linear  prediction,  due  to  Wiener.  This  least  squares 
approach  is  used  in  Section  8  to  derive  the  matrix  equation  from  a  completely 
different  approach.  Here  it  is  shown  that  least  squares  and  MEM  can  lead  to  the 
same  results. 

In  Section  9  we  turn  to  the  question  of  what  types  of  data  are  good  candidates 
for  MEM  treatment  and  which  are  not.  In  Section  10.  we  consider  the  so-called 
"Burg  technique"  for  determining  and  a^.  Many  advantages  of  the  Burg  tech¬ 
nique  will  be  pointed  out.  We  then  (Section  11)  apply  MEM  to  Fourier  Spectroscopy 
and  show  some  results.  Finally,  in  Section  12,  we  consider  the  interpretation, 
due  to  Burg,  of  the  power  spectrum  in  terms  of  " reflection  coefficients," 

An  annotated  bibliography  suggests  further  reading. 

2.  DEVELOPMENT  OF  THE  THEORY 

In  this  report,  a  recent  and  excellent  historical  survey  of  spectrum  analysis 

by  Robinson^  is  used.  In  addition,  a  very  understandable  introduction  to  time 

2 

series  analysis  by  Gottman  was  employed. 

2.1  From  Pythagoras  to  Fourier 

If  one  excludes  calendars  and  time-keeping  devices,  then  perhaps  the  first 
use  of  harmonic  or  spectral  analysis  was  due  to  Pythagoras  (600BC).  He  pointed 
out  that  a  string  of  a  musical  instrument  vibrated  in  a  way  that  could  be  broken 

1  Robinson,  Enders  A.  (1982)  A  historical  perspective  of  spectrum  estimation 
Proc.  IEEE  70(No.  9):885. 

2,  Gottman,  J.  M.  (1981)  Time -Series  Analysis,  Cambridge  Univ.  Press. 


down  into  "harmonics"  as  shown  in  Figure  1.  He  used  such  information  for  a 
malhematu'al  iiasis  of  musical  harmony.  From  this  time. until  tlie  1700's,  there 
seems  to  have  been  no  other  application  of  htirmonic  analysis. 


1ST  2ND  3RD 

Figure  1.  String  Harmonics 


The  mathematics  of  spectral  analysis  began,  in  its  modern  form,  in  the 
eighteenth  century.  Daniel  HernouUi  (1738)  was  the  first  to  solve  the  wave  equa¬ 
tion. 
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where  u  is  the  vertical  displacement  of  a  string,  c  is  the  velocity  of  the  wave,  x 
the  hori/ontal  coordinate,  and  t  the  time.  Ho  did  this  by  use  of  the  now  classic 
method  of  separation  of  variables; 

u(x,  f)  -  X(x)  ■r(t)  .  (2) 


The  boundary  conditions  are  given  by 


(3) 


where  'he  end  points  are  x  -  0.  t.  This  leads  to  a  solution  in  x  that  employs  sine 
(not  cosine)  wav'es  and  since  Eq.  (1)  is  linear,  one  can  use  superposition  to  get  the 
general  solution 
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u(x,  t) 


^  ^  sin  kx  {Aj^  cos  kct  +  B|^  sin  kct ) 


(4) 


k-1 


where  k  is  an  integer,  due  to  Fq.  (3). 

Here,  the  crucial  part  of  his  result  is  the  expression  for  the  initial  displace¬ 
ment  of  the  string[t  0  in  Kq.  (4)): 


u(x,  0) 


A,  sin  kx 


(5) 


k-0 


(>i  the  left,  u(x,  0)  is  an  arbitrary  function.  It  need  not  be  ’'anal.ytic'' .  On  the 
right  there  is  an  infinite  sum  of  sines,  each  of  which  is  analytic. 

Thus,  we  have  a  seeming  paradox.  ''How  c;in  a  non-analytic  function  (with, 
for  example,  discontinuous  derivatives),  be  represented  by  an  infinite  sum  of 
nothing  but  analytic  functions  d'  From  Kuler  (1707—1783)  and  Lagrange  (1736—1813) 
we  have  the  well-known  relation 


A_ 


^  J"  u(x,  0)  sin  nx  dx 
0 


(6) 


which  displays  the  "global"  nature  of  .Aj^.  It  turns  out  that  Eq.  (6)  is  useful  in 
clearing  up  the  apparent  paradox. 

The  first  announcement  that  an  almost  arbitrary  function  could  be  expanded 
as  in  Eq.  (5),  was  made  in  1807  by  Jean  Baptiste  Joseph  Fourier  to  the  French 
Academy.  (There  are  constraints  on  the  function,  but  analyticity  is  not  one  of 
them.)  The  distinguished  mathematicians  present  at  his  talk  rejected  his  result. 
Fourier  himself  was  never  able  to  provide  a  rigorous  proof,  although  we  now 
know  he  was  indeed  correct.  The  proof  of  Fourier's  assertion  made  use  of  the 
"  Z-t  ransformation"  .  ^  The  MEM  technique  of  spectral  analysis  (as  well  as  a 
very  large  and  increasing  literature  on  the  digital  processing  of  data  and  signals) 
IS  based  conceptually  upon  the  Z-transform. 

The  analytic  function  paradox  mentioned  above  is  resolved  when  the  Taylor 
series  expansion  (which  defines  analytic  functions)  is  connected  to  the  Fourier 
series  expansion  by  means  of  the  Z-transformation.  Consider,  in  place  of  u(x,  0) 
above,  an  analytic  function,  f(Z),  of  a  complex  variable  Z; 

.  -1 

Z  =  X  +  iy  =  (re-’'^) 
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Lxpand  f(Z)  in  a  Taylor  series  about  the  point  Z  0  to  obtain  what  we  drTine 
as  the  Z-transforni 


nz)  -  Y,  "n  ■ 

n  -0 


(7) 


If  Z_  is  defined  as  the  first  singular  point  from  the  origin,  then  the  radius 
0  -1  -1-1 
of  convergence  extends  from  Z  =  0  to  Z  ,  or,  in  other  words,  the 

region  of  convergence  is  where  |z|  >  l^^l.  Now,  letting  Z  =  1  exp  (-jO)  in 

Kq.  (7),  we  obtain  the  well  known  complex  form  for  the  Fourier  series 

00 

f(e  ^  a^(cos  ne  +  j  sin(nQ))  .  (8) 

n  0 

We  now  consider  three  cases  to  explain  the  paradox.  In  the  first  case,  Zp 
IS  inside  the  unit  circle.  In  this  case,  both  f(Z)  and  the  series  in  Eq.  (7)  are 
analytic  functions  and  there  is  no  paradox.  In  the  second  case,  Z^  is  outside  the 
unit  circle.  This  implies  that  the  series  does  not  converge  (as  can  be  seen)  and, 
again  there  is  no  paradox  because  the  series  is  invalid.  The  third  case  is  the  one 
that  resolves  the  paradox.  In  this  case,  is  on  the  unit  circle.  When  this 
happens,  the  Taylor  series  will  not  converge  at  some  or  all  of  the  points  on  the 
circle.  The  Tayloi  series  defines  an  analytic  function  (differentiable  to  any 
order)  inside  the  circle,  however.  On  the  circle  (the  Fourier  series  in  0)  W’e 
have  a  non-analytic  function.  The  solution  to  the  paradox  is  that  it  is  sufficient 
to  move  the  singularity  from  the  circle  to  the  inside  by  a  vanishingly  small  amount 
to  change  the  given  non-differentiable  function  of  6  into  one  which  can  be  differen¬ 
tiated  to  any  order,  that  is,  an  analytic  function.  The  proof  of  the  validity  of  the 
Fourier  series  makes  use  of  this  fact  (a  limit  is  taken).  The  reader  may  recall 
that  similar  arguments  are  used  in  Fourier  transforming  the  step-function.  The 
main  point  here  is  that  the  Z-transform  was  crucial  for  the  proof  that  Fourier's 
approach  is  correct. 

2.2  The  Periodogratn  as  Introduced  by  Sir  Arthur  Schuster 

The  next  major  step  in  spectral  analysis  of  data  came  with  the  introduction 
of  the  concept  of  the  periodogram,  P(u),  by  Sir  Arthur  Schuster  in  1898.  This  is 
defined  by  Eqs.  (9)  and  (10) 


X(u)  = 

n  I 


P(u)  =  4- 


where  the  represent  the  individual  values  of  the  time  series  data,  n  is  the 
integer  giving  it  the  (discrete)  time  label,  u  =  2>rf  where  f  is  the  frequency,  and 
N  is  the  total  number  of  data  points.  The  periodogram  was  conceived  for  the 
purpose  of  finding  "  hidden  periodicities"  in  the  data.  Implicit  in  this  approach  is 
a  model  of  the  data  that  consists  of  two  parts.  (Here  we  follow  Gottman.  )  One 
part  is  purely  deterministic  and  consists  of  one  or  more  sine  waves.  The  other 
part  consists  of  white  noise,  e^.  For  example 

Xj  =  A  cos  ut  +  B  sin  wt  +  e^^  .  (11) 

This  approach,  unfortunately,  was  plagued  by  a  feature  which  severely  limited 
its  usefulness.  As  shown  by  theory  and  experiment,  the  variability  or  standard 
deviation  of  the  periodogram  is  equal  to  its  mean  value  for  a  given  w.  In  other 
words,  the  noise  fluctuations  were  as  large  as  the  "  signal" .  Furthermore,  this 

feature  remains,  independently  of  the  length  of  data  record  or  N.  (For  a  proof, 

2  3 

the  reader  may  consult  Gottman  or  Claerbout.*)  For  this  reason  the  periodo¬ 
gram  was  considered  a  failure. 


»  < 


i  .  j 


• .  .  1 


2.3  The  Slutzky  Effect  and  the  Work  of  Yule 

In  the  1920's,  the  Russian  scientist  Slutzky  made  a  surprising  discovery.  He 

found  that  a  signal  consisting  of  pure  white  noise,  as  shown  in  Figure  2,  could 

give  rise  to  a  spectrum  with  a  clear  peak  merely  by  taking  a  moving  average 

2 

prior  to  the  spectral  analysis.  See  Figure  3,  based  on  Gottman.  In  effect,  he 
found  that  a  moving  average  can  introduce  a  periodicity,  that  is,  "create  order 
out  of  chaos".  Initially,  this  discovery  was  alarming  to  his  contemporaries 
because  they  routinely  applied  moving  averages  to  their  data  and  they  did  not 
suspect  that,  in  so  doing,  they  were  introducing  a  periodic  feature  into  their 
results.  Once  the  "Slutzky  Effect"  became  understood,  it  changed  the  way  one 
modeled  the  periodicities  of  nature.  In  effect,  this  led  to  a  model  that  consisted 


3.  Claerbout,  Jon  F.  (1976)  Fundamentals  of  Geophysical  Data  Processing. 
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Figure  2.  Pure  Noise  of  Zero  Mean  and  Unit  Variance 


Figure  3.  White  Noise  Spectrum  of  Figure  2 
is  Altered  by  4-point  Moving  Average.  It  now 
has  a  peak  due  to  this  average 


of  white  noise  as  an  input  to  a  filter,  and  the  output  (or  "filtered"  white  noise)  as 
"data  of  nature"  or  "signal  to  be  analyzed".  More  details  will  be  given  below,  but 
it  should  be  emphasized  that  this  model  is  exceedingly  different  from  the  one 
implied  by  Eq,  (11). 

Shortly  after  the  Slutzky  Effect  was  discovered,  G.  Udney  Yule  (1927)  analj  jed 
sun  spot  activity.  To  obtain  his  results,  he  originated  the  idea  that  Schuster's 
model  [Eq.  (11)]  is  not  of  general  validity.  Instead  of  this  "pure -sine -wave -plus - 
noise  process",  he  suggested  the  "filtered  noise"  model  in  the  following  graphic 


manner. 


Imagine  a  laboratory  with  a  swinging  pendulum  in  it,  and  suppose  that  an 
experiment  is  in  progress  in  which  a  transducer  is  used  to  record  the  pendulum's 
angular  position  as  a  function  of  time.  Now  suppose  that  a  group  of  young  boys 
get  into  the  laboratory  and  shoot  peas  with  their  pea-shooters  at  the  pendulum. 

The  impacts  would  randomly  perturb  both  the  amplitude  and  phase  of  the  pendu¬ 
lum.  The  resulting  time  series  would  no  longer  contain  a  deterministic  compo¬ 
nent  nor  would  there  be  an  exact  period.  Instead,  the  series  would  become 
unpredictable  (that  is,  for  long  times  into  the  future)  and  yet  it  would  appear 
smooth  and  fairly  periodic.  This  seemed  to  be  the  ideal  model  for  sun  spots. 

It  became  clear  later  that  it  was  of  general  validity  for  the  science  of  time  series 
analysis. 

Next,  consider  a  discrete  mathematical  treatment  of  the  "peas  and  pendulum" 
model.  The  pendulum  can  be  regarded  as  a  narrow  band  filter.  In  terms  of  dis¬ 
crete  time  series,  the  behavior  of  a  pendulum  can  be  modeled  by  a  homogeneous 
difference  equation 

b(n)  +  aj  b(n  -  1)  +  a2  -  2)  =  0  ,  (12) 

where  n  is  the  time  index,  a's  are  constants  that  are  given,  and  b(n)  is  the  depen¬ 
dent  variable.  The  solution  to  Eq,  (12),  easily  found  in  books  on  difference 
equations,  is 


b(n)  =  e 


■Xn  ^^*^0 


where  X  and  Wq  are  functions  of  the  constants  aj^  and  a^.  As  can  be  seen,  Eq.  (13) 
is  the  discrete  version  of  a  decaying  sine  wave  of  the  type  well  known  in  elemen¬ 
tary  differential  equation  theory.  To  add  the  effect  of  a  random  input  to  Eq.  (12), 
in  this  example  "the  peas",  a  term  c(n)  is  inserted  on  the  right  hand  side  as  in 
Eq.  (14). 

x(n)  +  a^  x(n  -  1)  +  82  x(n  -  2)  =  e(n)  ,  (14) 

where  x(n)  is  now  the  dependent  variable.  The  solution  of  Eq.  (14)  is  given  by 


x(n)  =  7  ,  b(k)  e(n  -  k) 


-  V"  A-'  •  •  A."  •.A  O.  .c_“  •  •  •  O,  O  ^  1  .  •  »  •  . 


where  b(k)  is  given  by  Kq.  (13).  The  right  hand  side  of  fq.  ( 1 . d  i>  rhc  ,  n,'.  ,Ui- 
tion  of  the  input  noise.  E(n),  and  the  filter  impulse  l  esporse  m  .  m  Uini  r 

analogous  to  the  continuou.s  theory.  Kq.  (14)  has  be<  oim-  known  ,s  '■  H  .• 
autoregressive  model  for  discrete  time  series.  In  the  ger.et  .1  .  .-i'.  ru.r  - 

ber  of  filter  (a^  coefficients  can  be  involved. 

Figure  4  gives  the  block  diagram  of  the  new  !  in.<  set  n  -  .  h  !  <  ■  .  ,■  ■  i. 

from  Slutzky  and  Yule's  work.  Quite  independ"  iit  1% .  the  i:,  atiee  ■' 
tral  analysis  of  random  processes  w.is  beim.  developed  in  ’tK  >,  •  ,  ■  •  . 

(1920—1930)  by  N’orbert  Wiener,  d'his  will  be  (  iinsnlei  t- i  n* 


Figure  4.  Block  Diagram  of  the  New  'Model'  for  the 
"Signal" 


>  J 


2.4  The  Contribution  of  Norbert  Wiener 

In  1923,  W'iener  developed  a  model  for  Brownian  movement.  This  work 
extended  the  work  of  Einstein  (1905)  and  this  Einstein-Wiener  theory  of  Brownian 
movement  is  important  to  all  theoretical  studies  of  spectrum  analysis  | according 
to  Robinson^  who  describes  this  work  in  more  detail  than  will  be  done  here] .  In 
1930.  Wiener  published  his  classic  paper  on  Generalized  Harmonic  Analysis, 
Equation  (15).  and  the  Fourier  transform  version. 


X(u))  -  B(u)  E(u) 


became  his  model  for  the  random  process.  In  Eq.  (16),  the  capital  letters  stand 
for  the  Fourier  transforms  of  the  lower  case  quantities  in  Eq.  (15).  The  con¬ 
volution  theorem  was  used  (the  Fourier  Transform  (FT)  of  a  convolution  equals 
the  product  of  the  FT'sl .  A  key  point  in  Wiener's  analysis  is 


t (n) e(k) 


where  the  bar  indicates  a  suitable  average  or  "expectation  value".  On  the  left 
side  we  have,  by  definition,  the  autocorrelation  of  a  noise  process;  and  on  the 
right,  we  have  a  Kronecker  delta  function. 

Let  us  now  consider  an  important  observation.  In  the  context  of  the  signal 
model  of  Figure  4.  which  is  used  by  Wiener,  the  usual  explanation  for  the 
failure  of  the  periodogram  (added  noise)  is  not  quite  valid.  ^  The  true  explanation 
is  the  presence  of  the  factor  E(u)).  which  is  a  very  noisy  function,  in  Eq.  (16). 

The  FT  of  a  rough  function  is  not  necessarily  a  smooth  function.  Indeed.  Wiener 
showed  in  1923  that  the  FT  was  rough  (as  Robinson's  very  useful  survey  explains). 

In  passing,  it  is  of  interest  to  note  that  Wiener  was  very  interested  not  only 
in  Brownian  Motion,  but  also  in  a  very  general  aspect  of  the  mathematical 
characterization  of  noisy  but  semi -ordered  patterns.  He  described  in  his  auto¬ 
biographical  accounts  how  he  was  inspired  by  the  sight  of  the  Charles  River,  as 
seen  from  MIT.  and  wondered  how  one  could  mathematically  describe  the  sur¬ 
face  of  the  water  with  its  random  waves. 

So  far  as  spectral  analysis  of  data  is  concerned,  Wiener's  most  concrete 
contribution  was  the  theorem  named  after  him  and  Khintchine.^  This  states  that 
the  autocorrelation,  defined  by  ^(k)  in 


s  ^  x*(n)  x(n  +  k)  ,  (18) 

n 

is  related  to  the  PSD  *(u)  by 
P 

*(u))  =  ^  ^tlk)  .  (19) 

k=-p 

where  p  is  the  maximum  lag.  This  implies  that  the  key  to  the  essentially  con¬ 
tinuous  spectrum  of  the  process  in  Figure  4  is  in  the  autocorrelation.  (See 

4 

Y.  W,  Lee  for  an  elementary  study  of  Wiener's  approach. )  Unfortunately,  this 
did  not  lead  immediately  to  practical  data  analysis.  Tukey's  breakthrough  (to  be 
described  next)  was  needed  for  results  to  be  achieved.  As  will  be  seen,  howevei , 


4.  Lee,  Y.W.  (1960)  Statistical  Theory  of  Communication,  Wiley. 

See  E.  Wolf,  J OSA  72,  No.  3,  (1982)  (sec.  2)  for  an  interesting  contrast  between 
the  treatments  of  Wiener  and  Kolmogorov. 

^See  Robinson  and  Treitel  (1980),  Appendix  16-1,  for  a  rigorous  treatment. 

^See  Wolf,  op  cit. 
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Wu-ncr's  wui  k  on  hnr.ir  prediction  and  the  "Wiener  Optimum  Filter"  based  on 
least  squar ’s  also  has  a  crucial  role  to  play  tn  MEM  spectral  analysis. 

2.5  riir  iilai'kinan  and  Tiikry  .Approach 

The  practical  approach  to  the  PSD  (which  according  to  Robinson^  was  mainly 
due  to  Tukey)  was  described  in  the  famous  book  by  Blackman  and  Tukey.  For 
fur  ther  details  the  retider  may  consult  that  reference. 

Fnlike  Wiener,  Blackman  and  Tukey  considered  the  case  where  the  data 
record  is  finite.  Fsing  their  notation,  the  autocorrelation,  C^,  is  defined  (dis¬ 
crete  case)  as 


C 


r 


E 


q=0 


XX. 
q  q+r 


(20) 


.  here  x^  is  the  time  series,  q  the  time  index,  and  r  the  lag.  One  assumes  unity 
for  i.ie  sample  interval  spacing  in  Eq.  (20).  The  total  number  of  data  points  is  n. 
The  PSD,  V^,  is  obtained  from  (in  Eq.  (20)  x^  is  real): 

m-1 

^  •  <21) 
q=l  ^  '  J 

Here,  At  is  the  sampling  interval,  m  is  the  maximum  lag,  and  the  cosine  arises 
from  the  fact  that  the  original  series  is  real.  These  authors  then  introduced  ways 
to  smooth  out  the  PSD  to  get  rid  of  the  noise  fluctuations  that  killed  the  usefulness 
of  the  periodogram. 

They  used  three  smoothing  techniques:  (a)  the  truncated  autocorrelation, 

(b)  "window  carpentry",  and  (c)  direct  spectral  smoothing.  The  last  two  tech¬ 
niques  also  help  with  the  problem  of  "spectral  leakage"  which,  as  will  be  ex¬ 
plained,  comes  from  the  fact  that  the  data  record  is  not  infinitely  long. 

To  truncate  the  autocorrelation,  the  maximum  lag  m  in  Eq.  (21)  is  limited 
to  be  a  small  fraction  of  n,  the  total  number  of  points.  Typically,  the  maxi¬ 
mum  number  of  lags  is  taken  to  be  1/10  of  the  number  of  data  points.  The 
resultant  smoothing  of  the  PSD  is  due,  as  explained  by  Blackman  and 

5 

Tukey,  to  the  resulting  increase  in  the  number  of  degrees  of  freedom  in  their 
spectral  model  based  on  the  chi -squared  statistic  (compare  this  with  Gottman's 


5.  Blackman,  R.B,,  and  Tukey,  J.W,  (1958)  The  Measurement  of  Power  Spec¬ 
tra,  Dover. 
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discussion  of  the  periodogram^).  In  effect,  the  smoothing  results  from  the  pool¬ 
ing  of  10  elementary  spectral  values  for  each  PSD  result  based  on  the  10  percent 
truncated  autocorrelation.  Their  book  contains  confidence  intervals  in  a  table  to 

guide  one  in  the  use  of  truncated  autocorrelations.  Further  discussions  of  use 

6  V 

will  be  found  in  Bendat  and  PiersoF  and  Otnes  and  Enochson. 

"Window  carpentry"  (called  apodization  in  optics)  uses  a  weighting  factor, 

W^,  which  is  unity  for  zero  lag  and  smoothly  decreases  to  nearly  zero  at  the 
maximum  lag.  Thus,  in  Eq.  (21),  is  replaced  by  C^.  Blackman  and 
Tukey  showed  that  exactly  the  same  smoothing  effects  can  be  obtained  by  means 
of  a  weighted  average  in  the  frequency  domain.  For  example,  the  use  of  a  weight 
given  by  W  -  ^  (1  +  cos  .  I  c!  <  m,  is  exactly  the  same  thing  as  using 


V'=xV  ,+7rV 

r  4  r-1  2  r  4  r+1 

in  the  frequency  domain,  and  in  either  case,  it  is  called  "Hanning".  (P.  36  of 
Blackman  and  Tukey  explains  end  point  corrections.) 

Another  smoothing  method,  due  to  Welch,  introduced  more  "degrees  of  free¬ 
dom"  in  a  more  direct  manner.  In  this  method,  the  original  data  record  was 
chopped  into  sub-sections.  The  PSD  was  obtained  for  each  of  these  subsections, 
and  then  they  were  all  averaged  together  [Eq.  (23)),  resulting  in  a  smooth  PSD. 


P(u)  =- 


"  Leakage"  is  another  practical  problem  solved  by  Blackman  and  Tukey.  This 
problem  comes  from  the  fact  that  a  finite  segment  of  data  is,  in  effect,  the  prod¬ 
uct  of  an  infinite  data  string  multiplied  by  a  rectangular  pulse  (equal  to  1  over  the 
available  data  and  zero  outside).  From  the  convolution  theorem  it  follows  that 
this  is  equivalent  to  convolving  the  true  spec' rum  with  a  sine  function 
[(sin  0)/01,  6  =  (jNAT.  Window  carpentry  alleviates  this  problem.  One  can  use 
tapered  weights  to  reduce  the  "side-lobes"  of  this  spectral  window  by  a  factor  of 
10  (these  lobes  are  the  cause  of  leakage  from  neighboring  frequencies)  but,  of 
course,  there  is  a  price.  The  resolution  decreases,  for  example,  by  a  factor  of 
2  (from  the  effect  of  smoothing)  if  one  were  to  use  a  triangular  window. 

G.  Bendat,  J.S. ,  and  Piersol,  A.  G.  (1971)  Random  Data:  Analysis  and  Meas¬ 
urement  Procedures,  Wiley-Interscience. 

7.  Otnes,  R.K. ,  and  Enochson,  L.  (1978)  Applied  Time  Series  Analysis,  Vol.  1., 
Wiley. 


Another  approach  to  the  leakage  problem  involves  what  Blackman  and  Tukey 
call  " pre -whitening"  .  For  example,  consider  Figures  5  and  fi.  Suppose  one 
considei'S  the  red  noise  shown  in  Figure  5.  Leakage  would  bring  about  an  incor¬ 
rect  slope  from  a  raw  analysis,  if,  however,  the  data  are  "first  differenced", 
that  is. 


X  -  X  , 

1  I-l 


(24) 


then  the  resulting  spectrum,  as  they  show,  is  that  shown  in  Figure  fi.  This  is  a 
flat  spectrum  and,  therefore,  no  leakage  is  possible.  After  this  is  done,  the 
PSD  IS  "post  colored"  by  means  of  a  spectral  version  of  the  inverse  transform  of 
Fq.  (24). 


LOG  OJ 


F’igure  5.  PSD  of  Original  Signal  Before  Prewhitening 


LOG  P(UI) 


■  SLOPE  •  0 


LOG  W 


Figure  6.  PSD  of  Signal  After  "  F  rst -Differencing"  or 
Prewhitening 


-r^  V-.  --^  7-  ■  ^ 


Previhitening  has  been  mentioned  here  because  it  has  relevance  to  the  MI-M 
spectrum.  Tlie  more  general  form  of  Kq.  (24)  is  a  combination  of  autoregres¬ 
sion  and  moving  averages  (see  Blackman  and  Tukey^).  As  such  it  repi-esents 
something  very  analogous  in  concept  to  MKM  or,  as  we  shall  see,  AR  analysis. 

In  effect,  prewhitening  can  be  imagined  to  be  an  "eyeball"  kind  of  MEM.  If  one 
could  produce  a  perfectly  flat  PSD  by  prewhitening,  the  recolored  spectrum  would 
be  the  MEM  spectrum  (or,  in  the  most  general  case  to  be  explained,  the  ARMA 
spectrum).  This  remark  will  be  more  meaningful  to  the  reader  later  on  when 
MEM,  AR,  and  "ARMA"  are  explained. 

Blackman  and  Tukey  seem  to  have  introduced  the  term  "aliasing"  as  depicted 

O 

in  Figure  7.  (See  also  Goldman  "Information  Theory".)  This  figure  shows  how 
the  process  of  sampling  data  at  intervals  of  At  can  give  rise  to  the  illusion  that  a 
large  wavelength  is  present  when  in  fact  it  is  a  short  one.  There  is  an  analogy 
here  with  the  effect  of  a  stroboscope  which  can  slow  down  or  even  stop  the  motion 
of  a  rotating  fan.  To  avoid  such  aliasing,  one  filters  out  all  high  frequencies  be¬ 
fore  Sampling  the  data.  The  filter  must  remove  all  frequencies  above  the  Nyquist 
frequency  given  by  (1/(2  At))  if  there  is  to  be  no  aliasing. 


Figure  7.  Actual  Signal  Is  Given  by  the  Full  Curve.  The  dotted 
curve  represents  the  aliased  signal  caused  by  sampling  at  the 
low  frequency  given  by  the  dots 


8.  Goldman,  S.  (1953)  Information  Theory,  Prentice-Hall,  Inc. 

Another  analogy  has  been  suggested  by  G.  Vanasse.  It  is  the  overlapping  orders 
in  a  grating  spectrometer.  The  grating  samples  the  incoming  wave  front  at  an 
internal  corresponding  to  the  grating  constant:  it  is  because  of  this  finite  sam¬ 
pling  interval  tliat  spectra  of  different  orders  are  produced.  If  the  spectral 
bandwidth  is  too  large,  aliasing  (that  is  overlapping  of  orders)  occurs.  In  con¬ 
trast,  the  prism  is  continuous  in  its  effect  on  the  wavefront  and  no  spectral 
orders  appear. 
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Historically,  the  next  breakthrough  was  the  invention  of  the  ''I'a.st  I'oui'ier 
Transforni"  or  Kl'T.  This  made  the  calculation  of  the  PSD  practical  from  m 

p 

economic  point  of  view  and  it  is  describefj  in  great  detail  in  Hrighani'  .is  well  as 
in  many  other  places.  Cooley  and  1  ukey  publi.shed  the  first  algorithm  for  the 
FFT.  Later  on  it  was  realized  that  the  general  idea  was  already  in  the  litera- 
tui'e;  howi'ver,  even  todtiy  few  people  seem  to  realize  that  (laiiss  was  the  verv 
fi  rst  one  to  use  it. 

We  have  arrived  at  the  point  in  this  account  when-  John  Hut  g  made  his  con¬ 
tributions,  which  are  the  MFM-PSD,  and  the  associated  methods  to  obtain  it.  In 
his  Ph.  D.  thesis  (Stanford),  he  made  two  very  important  observations  which  set 
the  stage  for  what  was  to  follow.  F'irst,  he  noted  that  sometimes  the  PSD  which 
was  obtained  by  the  techniques  described  .ibove  could  be  negative.  Theoretically 
such  a  result  is  complete  nonsense.  It  is  a  result  of  the  fact  that  the  sine  func¬ 
tion  lias  negative  sidelobes.  The  cure  is  "window  carpentry",  but  with  loss  of 
resolution.  He  also  noted  that  the  PSD's  obtained  did  not  agi'ee  with  the  auto¬ 
correlations  of  the  data.  In  his  words:  "These  two  affronts  to  common  sense 
were  the  main  reasons  for  the  development  of  maximum  entropy  speetral 
analyst s.  .  .  .  " . 

"If  one  were  not  blinded  by  the  mathematical  elegance  of  the  conventional 
approach,  making  assumptions  as  to  unmeasured  data  and  changing  the  data 
values  th.it  one  knows  (to  be  correct!  would  be  totally  unacceptable  from  a  com¬ 
mon  sense  and,  hopefully,  from  a  scientific  point  of  view." 

I'hrse  .ire  strong  words,  but  they  arr  well  founded.  Since  the  method  he 
■leveloped  rests  on  the  general  principle  of  maximum  entropy,  we  turn  now  to  a 
discussion  of  the  work  of  Jaynes,  which  is  the  foundation  for  the  use  of  this 
principle  in  the  present  context. 

:l.  M \M\U  M  ENTROPY  -  THE  "HIGHER  PRINCIPLE" 

In  1P57,  Fdwin  T.  Jaynes  started  an  unusual  pro.iect  in  which  he  derived 
statislic.il  mechancis  on  the  basis  of  information  theory.  This  work  has  grown 
in  significance  over  the  ye.ir.s.  The  generality  i.s  due  to  the  fact  that  Jaynes  has 
ni.ide  a  fundamental  contribution  to  the  coru  ept  of  probability.  The  original 
approach  to  "probability"  was  due  to  Laplace.  He  regarded  it  as  a  reflection  of 
one's  stale  of  knowledge.  For  example,  suppose  one  is  given  a  coin  to  flip.  The 
question  of  what  probability  to  assign  to  "heads"  and  "tails"  is  decided,  by  Laplace, 
by  the  "principle  of  indifference"  or  "the  principle  of  in.sufficient  reason".  If 


Hrigham.  L.  O.  (1974)  The  Fast  Fourier  Transform.  Prentice  Hall,  Inc. 
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there  is  no  known  reason  to  prefer  either  "heads"  or  "tails",  then  one  assigns  an 
equal  probability  to  them.  The  die  is  similar  in  this  respect.  If  there  is  no 
known  reason  to  suspect  that  it  is  "loaded",  then  one  should  assign  an  equal 
probability  to  any  particular  outcome  of  a  "throw"  of  a  die.  In  this  case  the 
probability  would  be  1/6.  Unfortunately,  if  one  did  know  that  the  die  was  unfair 
or  loaded,  the  Laplace  approach  would  give  no  answer.  As  will  be  explained 
below.  Jaynes  has  shown  us  how  to  extend  the  Laplace  principle  to  the  case  of 
the  loaded  die  and  even  to  far  more  general  cases  (one  particular  one  being  the 
PSDL 

Laplace's  subjectiv'e  concept  of  probability  went  out  of  fashion  when  it  was 
found  to  be  very  limited  in  application  in  the  form  that  he  gave  us.  It  was  re¬ 
placed  by  another  definition  known  as  the  "frequency"  interpretation  given  in 
most  books  on  the  subject  today.  The  generalization  due  to  Jaynes,  however, 
came  at  a  time  w  hen  other  scientists  such  as  Cox.  Jeffreys,  and  Good  were 
reconsidering  the  subjective  approach.  For  an  excellent  history  of  the  subject  of 
probability,  the  reader  should  consult  the  paper  by  Jaynes  given  at  the  MIT 
symposium  on  Maximum  Kntropy  (Levine''^).  Jaynes  was  drawn  to  this  subject 
by  the  following  strange  observation.  Shannon  introduced  the  concept  of  "infor¬ 
mation"  to  communication  engineering,  and  its  mathematical  form  is  essentially 
identical  to  that  of  entropy  in  the  statistical  mechanics  of  Boltzmann  and  Gibbs. 
"Why  should  they  be  the  same?'  is  the  question  Jaynes  asked  himself.  He  was 
later  to  find  the  answer,  which  is  that  they  turn  out  to  be  two  examples  of  the 
application  of  the  same  principle.  In  deriving  this  answer,  he  generalized  the 
Laplace  method  of  assigning  probabilities.  He  was  also  able  to  obtain  results  in 
non-equilibrium  statistical  mechanics. 

To  approach  Jaynes’  work,  this  report  will  use  a  simple  example  that  illus¬ 
trates  the  main  ideas  behind  the  maximum  entropy  concept  as  it  applies  to  sub¬ 
jective  probability  assignment.  The  particular  example  used  is  one  given  by 
Jaynes  himself  in  a  lecture  that  he  gave  at  Brandeis  in  1963  (see  Ref.  11). 

3.1  The  Brandeis  Die 

Jaynes'  example  is  that  of  the  loaded  die.  Over  the  years  it  has  become 
famous  as  the  "Brandeis  die".  It  is  described  as  follows.  Suppose  we  first  con¬ 
sider  a  fair  die.  (P^  =  1/6.)  Let  each  face  be  numbered  according  to  the  number 
of  spots,  so  that  the  average  number  of  spots  per  throw  would  be 


10.  I.evine,  R.  D.  ,  and  Tribus,  M.  (1979)  The  Maximum  Hntropy  Formalism, 

MIT  Press,  Cambridge,  Mass. 

11.  Rosenkrantz,  R.D. ,  (Ed.)  (1983)  B.  T.  Jaynes:  P^ers  on  Probability. 

Statistics  and  Statistical  Physics,  Reidel  Pub.  Co.,  Boston,  MA. 
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U)  p  3.  5 


(If  i  inii'.sf  w  f  have,  by  definition. 


ul'irh  says  that  tho  sum  of  iH  the  [irobabi hties  add  up  to  unity  or  reit.ainty.  liut 
noiv  ronsidcf  ,i  loaded  die.  In  particular,  suppose  that  all  one  knows  is  that  the 
average  number  of  spots  is  4.  5,  that  is 


The  question  is.  what  probability  should  be  assigned  to  P.  given  the  eonstraints 
Kqs.  (2(il  and  (27)^  The  Laplace  type  of  approach  does  not  help  very  much. 
Certainly  it  gives  no  numerical  result.  Let  us  explore  some  possibilities.  Fig¬ 
ure  H.  for  example,  gives  the  correct  .inswer.  for  the  average,  but  it  is  simply 
not  re.isonable.  Figure  9  is  more  honest,  but  it  still  jumps  to  conclusions. 
I’lgure  10  is  even  worse  because  it  is  uneven  without  reason.  I'igure  11  is  the 
best  one  up  to  this  point.  It  is  based  upon: 


(12i  -  7) 
210 


VnfortuiiateU ,  one  c.mnot  employ  a  linear  relation  and  obtain  an  avei'age  greater 
than  5  without  introducing  a  negative  probability.  This  shows  that  the  general 
curve  must  not  be  linear. 

The  above  sequence  illustrates  that,  to  be  as  objective  as  possible,  one  must 
avoid  letting  I'cflerl  more  information  than  is  actually  given.  In  effect,  we 
want  P.  to  reflect  only  the  known  constraints  that  have  been  imposed  upon  it,  and 
then  be  as  uncommittal  .as  possible  with  regard  to  anything  else.  How  then  does 
one  do  this'’  The  due  is  given  by  Shannon's  definition  of  information,  Sj,  given 


P  In  P. 
1  1 


•r.  • 

r  I 
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PROS  PROB 


Figures  8—11.  Various  .Assignments  of  Probabilities  for  Die  Throws.  Figure 
11  is  the  most  fair  of  the  four,  yet  it  too  is  not  general  enough.  Text  gives 
explanation 


Thus,  the  prescription  of  Jaynes  for  the  case  at  hand  is  to  maximize  S  in  Eq.  (29) 
with  the  constraints  of  Eqs.  (26)  and  (27). 

If  this  procedure  is  compared  to  the  one  which  is  used  in  statistical  mechanics, 
one  will  see  the  connection  with  probabilities  based  on  the  frequency  interpreta¬ 
tion.  The  statistical  mechanical  approach  is  based  on  permutations  and  combina¬ 
tions  of  states  under  such  constraints  as:  (a)  there  is  a  constant  number  of  par¬ 
ticles.  (c)  there  is  a  given  average  energy,  and  so  on.  Applying  this  to  a  die, 
one  would  consider  the  number  of  ways  of  getting  throws  of  single  spots,  N2 
double  spots,  and  so  on,  up  to  Ng  throws  of  six  spots  out  of  a  total  of  N  throws. 

The  Bernoulli  Formula  (so  famous  in  statistical  mechanics)  is.  for  this  particular 
case: 

N' 

W  -  - '■1: - 

N.  ;N„'  .  .  N„! 


(30) 


When  \  and  NV  are  all  sufficiently  large  one  can  use  the  well  known  Stirling 
Approximation  for  the  factorial.  This  gives  us 

4  'n  W  3  -y^  (N. /N)  In  (N./N)  (31) 

i 

Here  the  quantities  (N^/N)  represent  "frequencies".  Now,  if  one  maximizes  the 
r  ight  side  of  Kq.  (31),  one  is  assigning  frequencies  that  occur  the  greatest  num¬ 
ber  of  ways.  This  is  exactly  the  method  for  finding  the  frequency  probability 
distributions.  But  this  is  also  exactly  what  is  done  by  maximizing  S  in  Ifq.  (29). 
Thus,  it  can  be  seen  that  the  approach  of  Jaynes  is  consistent  with  the  frequency 
interpretation.  It  is  more  general  in  that  it  can  be  used  even  in  a  case  where 
only  one  "throw"  is  inv’olved.  It  is  also  much  simpler,  and  can  be  considered  to 
be  I  short  cut  for  calculating  "greatest  number  of  ways"  types  of  probabilities. 

In  the  following,  the  Jaynes  formalism  will  be  derived  and  it  will  be  applied 
•o  the  Hrandeis  Die  problem,  .^s  an  introduction,  however,  consider  Figure  12, 
which  represents  a  summary  of  the  formalism.  The  constraints  consist  of  any 
number  of  average  values  that  are  given,  plus  the  fact  that  the  probabilities  must 
sum  to  unity.  By  maximizing  entropy,  it  turns  out  that  the  probability  will  always 
have  the  form  of  an  exponential.  .Another  thing  to  notice  is  that  this  formalism  is 
identical  to  that  found  in  statistical  mechanics,  where  the  quantity  Z  is  called  the 
"partition  function".  Figure  12  is  in  essence,  the  basis  of  the  Jaynes  program. 

It  also  underlies  the  MEM  technique  for  the  PSD,  as  will  be  seen. 

3.2  Derivation  of  Jaynes’  Formalism 

Jaynes'  procedure  was  to  maximize  S  in  Eq.  (29)  with  the  constraint  condi¬ 
tions 

Pi  M«i)  =  (r  =  1 . n)  (32) 

i 

and 

2]  R  =  1  =  (33) 

i  i 

where  fp  =  1  and  (Iq)  =  1.  Note  that  we  used  f^  =  1  to  make  the  notation  uniform. 
To  maximize  entropy  (or  minimize  information)  under  our  constraints,  we  must 


PROBLEM: 


TO  FIND  Pj  FOR  "i"  MUTUALLY  EXCLUSIVE  AND  EXHAUSTIVE  OUTCOMES, 
i  IS  AN  INTEGER 

GIVEN. 

1  -  I  P|  -  1 

i 

2  ~  AVERAGES  OF  i, 

I  Pj  fflBj)  -  <  VSi))  (1  S  '  < 

■  STATE  OF  NATURE 


THE  MEM  ANSWER: 


Figure  12.  The  Jaynes  Formalism 
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use  Lagrange  multipliers.  [See  Page  .  ]  Thus,  the  variation  is  taken  and  set 
equal  to  zero: 


«lSj  -Uo<fo)  •  M<^1>  ■  ^  °  (34) 

The  pq  and  constants  are  the  Lagrange  multipliers.  Inserting  for  S  and  the 
<f,)'s  from  Eqs.  (29).  (32),  and  (33) 

6  ^  [P.  In  P.  +  Pq  P.  +  1^(6.)  P.  +  , . .  ,  ]  =0  (35) 

j 

The  variation  is  taken  with  respect  to  P^.  For  this  reason  each  square  bracket 
for  a  given  j  must  vanish.  Consider  the  jth  term.  We  have 

12.  Page,  L.  (1935)  Introduction  to  Theoretical  Physics.  Van  Nostrand  Co. . 
N.Y,,  p.  311.  - 


6P.  In  P.  +  +Af  (0)+...  =  0 

•  \  J  •’ 


If  we  now  call  Aq  =  1  +  (Jq,  we  obtain 


P.  =  exp  (-  Aq  -  A^  f^(0.)  -  . .  .  1  (37) 

which  gives  us  the  form  of  the  probability  and  explains  why  it  must  be  exponential. 
The  next  question  is,  how  does  one  obtain  Equation  (37)  can  be  rewritten 


P.  -  e  “  e 


-Aq  -A^fi(«j)  -A2f2(0^) 


and  B:q.  (33)  can  be  written  as 


3  e  Z 


where 


-Aifi(^.)  ^-A2f2<e.) 


V-  ♦ir’' 


Hence  we  have 


Xq  -  In  Z 


To  obtain  (and  in  general)  we  first  consider 

-Xf,  -x,f,(e.) 

(fi)  "  '  •*• 

j  3 


<fl) 


1 

•Z 


E'l 


(e.)  e 


-X,f,(fl,) 


n  1 


3 


(43) 


from  Eq.  (41),  hence 
-a(ln  Zl 

<'.'*)>>  ■  -aiq—  <«' 

that  is,  one  can  obtain  Eq.  (43)  from  Eq.  (44)  and  therefore  they  are  the  same. 

At  this  point.  Figure  12  has  been  explained.  Considering  the  number  of  uses 
it  has  found,  it  is  quite  simple  and  straightforward.  (See  the  Jaynes'  references 
in  Rosenkrantz^^  for  an  indication  of  the  scope  of  the  applications.) 


3.3  Solution  of  the  Brandeii  Die  Problem 

It  is  very  instructive  to  work  out  one  case  in  complete  numerical  detail.  We 
will  solve  the  Brandeis  die  problem. 

As  was  shown,  Pj  is  given  by 


P. 

3 


^-x,f(e.) 


1 


where 


the  number  of  spots  on  a  face.  The  Z  is  given  by 


(45) 


Z 


3  3  =  1 


(46) 


'^1 

To  simplify  calculations  define  x  *  e  .To  reduce  the  six  member  sum  in  Eq. 
(46)  to  a  simpler  expression,  we  consider 


1  +  X  +  X  + 


(47) 


. . .  > .  A- V 


O  -  - 
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*hich  (Mil  be  obtained  from  the  geometric  senes  expansion.  liquation  (47)  doe.s 


not  terminate.  We  w.ish  to  obtain  the  finite  sum  given  by  x'.  F 


irst  consider 


nl 


1  -  X 


m+ 1 


1  -  X 


1  +  X  +  .  .  .  +  X 


(48) 


This  is  the  same  as  Fq.  (47)  but  the  term  x'^^\/(l  -  x)  is  subtracted  from  it.  To 
get  rid  of  the  number  1  on  the  right  side,  we  subtract  unity  from  both  sides: 


1  -  X 


m+1 


1  -  X 


x(l  -  x") 

(1  -  xi 


1  (where  m  -  6) 


(49) 


The  expression  for  Z  is  now  manageable. 

We  are  given  the  fact  that  the  average  number  of  spots  is  4.  5  instead  of  3.  5. 
The  latter  would  be  the  result  of  a  fair  die.  Thus  the  formalism  shows  that,  to 
obtain  Xj.  we  may  use 


-  9(ln  Z)  .45,  1  -  7x'''  4  fix*^ 

‘  (1  -  x)(l  -  xS 


(50) 


and  a  hand  calculator  can  be  used  to  solve  Eq.  (50)  numerically.  A  more  conven¬ 
ient  form  for  Eq.  (50)  is 

-  5x^  +  9x  -  7  0 

The  solution  is  x  1.449  =  e  ,  and  Xj^  -  -0.371,  Z  =  26,  66,  and  since 

(•■i  ■  t) 

(P,,  ...  P,.)  MO.  054,  0.079,  0.114,  0.  165,  0.239,  0.347) 
i  1) 

It  should  be  mentioned  that  when  one  makes  the  transition  from  the  discrete 
case  to  the  continuous  one,  the  definition  of  entropy  is  no  longer  that  given  above. 
Jaynes  pointed  this  out  and  has  shown  how  to  solve  the  difficulty  (Shannon,  it  seems, 
did  not  discuss  this  difficulty).  The  solution  involves  continuous  transformation 
groups.  The  interested  reader  may  consult  Jaynes  for  further  information. 


(51) 

(52) 

(53) 
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The  work  of  Jaynes  remains  a  matter  of  some  controversy  at  this  time. 

13 

Prof.  Abner  Shimoney  et  al  use  a  rather  elaborate  mathematical  treatment  which, 
they  assert,  proves  that  the  maximum  entropy  formalism  is  not  universally  cor¬ 
rect.  In  fact,  they  claim  that  it  works  only  for  a  very  highly  limited  set  of  prob¬ 
lems.  Their  argument  seems,  to  this  writer,  mainly  directed  to  the  applications 
in  statistical  mechanics.  A  mathematical  proof  that  MEM  procedures,  as  used  to 
obtain  PSD,  are  rigorously  valid  seems  to  be  needed.  On  the  other  hand,  one  can 
hardly  dispute  the  success  of  MEM  in  PSD  analysis;  and,  there  is  the  likelihood 
that  the  mathematical  objections  will  be  resolved  in  the  future.  Furthermore,  it 
will  be  shown  that  the  manner  in  which  MEM  is  employed  in  this  report  allows  it 
to  be  replaced  by  a  "least  square -error"  approach,  which  in  turn  leads  to  exactly 
the  same  answers.  In  this  sense  one  knows  that  there  is  little  to  fear  regarding 
the  controversial  aspects  of  the  general  theory  of  MEM  analysis  in  the  context  of 
the  applications  to  be  discussed.  * 


3,4  The  MEM  Spectrum 

Initially,  one  is  given  the  autocorrelations,  which  will  be  labeled 
(-m  <  k  <  +  m)  where  m  is  the  maximum  lag.  The  Wiener-Khintchine  Theorem 
relates  the  PSD,  *(10)  to  the  by 


/  du, 


where  s  2^1^^  =  1/ At  is  the  Nyquist  frequency  and  At  is  the  sampling  interval. 
Eq.  (54)  gives  us  our  "constraints",  and  the  next  step  is  to  maximize  the  entropy. 
This  entropy,  or  rather  the  "entropy  rate",  is  related  to  the  PSD  by,  essentially. 


k  ^  •  « 


In  9(u)  du 


13.  Shimony,  A.,  and  Dias,  P.  M.  (1981)  A  critique  of  Jaynes'  maximum  entropy 
principle.  Advances  in  Applied  Math.  ^  172-211. 

A  discussion  of  Maximum  Entropy  as  a  special  case  of  Maximum  Likelihood  will 
be  found  in  Ch.  8  of  Deconvolution  by  P.  A.  Jansson.  This  chapter  was  written 
by  B,  Roy  Frieden.  Perhaps  this  will  limit  the  objections  of  Shimony  et  al. 
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Derivations  of  this  equation  are  in  Burgs'  thesis'^  and  in  Haykin.  These  der¬ 
ivations  assume  that  the  P  associated  with  $(u))  are  "multivariate  Gaussian". 

t  15 

Javnes,  however,  shows  that  his  formalism,  (Fig.  12)  leads  directly  to 


P(y)  ^  e 


(56) 


where  y  is  a  vector  representing  the  time  series.  Thus  MEM  already  implies 
that  P(y)  is  multivariate  Gaussian.  In  other  words,  it  was  not  really  necessary 
for  Burg  to  assume  a  "Gaussian  random  process".  Equation  (5(i)  is  a  Gaussian, 
because  the  definition  of  makes  the  exponent  quadratic. 

We  now  perform  the  variation  to  get  the  MEM  PSD,  Lagrange  Multipliers 
will  be  used,  just  as  in  the  previous  case. 


In  $(u) 


m  / 


-iuAt, 


1] 


d(j  , 


(57) 


k=  -m 

The  variation  is  taken  over  *(u).  The  result  is 


/ 


1 

4<u) 


-  E 


iwAt. 


k=  -m 


fi*((j)  =  0 


(58) 


Again,  since  each  6  (u)  for  each  u  is  independent,  each  square  bracket  must  equal 
zero,  and  this  gives 


4>(w)  =  — - - - -  .  (59)* 

^  i(wAt  ) 

E 

k^-m 


which  is  the  MEM  spectrum.  At  this  point  the  values  of  the  A.  are  unknown.  To 
obtain  them,  one  must  resort  to  a  technique  called  Levinson  Recursion,  which 


14,  Haykin,  S.  (Ed. )  (1979)  Nonlinear  Methods  of  Spectral  Analysis,  Springer- 

Verlag. 

15.  Jaynes,  Edwin  T.  (1982)  On  the  rationale  of  maximum -entropy  methods, 

Proc.  IEEE  7£(No.  9):939.  The  issue  is  entitled  Spectral  Estimation. 

The  coefficients  here  are  related  to  the  a's  given  in  the  MEM  equation  of  the 
introduction  through  an  autocorrelation. 


depends  on  Toeplitz  matrixes.  These  will  be  explained  below:  however,  it  is 
first  necessary  to  explain  the  "Z  transformation." 

4.  THE  Z-TRANSFORMATION 

The  following  will  provide  the  reader  with  the  minimum  background  needed 
for  understanding  the  remainder  of  this  report.  The  books  by  Claerbout, 
Oppenheim  and  Schafer,  Bracewell.  and  Robinson  and  Treitel  are  suggested  for 
further  reading.  The  text  below  will,  for  the  most  part,  follow  the  treatment  of 
Claerbout.  Since  the  Z-transform  forms  the  basis  of  the  MEM  approach  to  the 
PSD,  it  follows  that  it  also  explains  the  basic  shape  of  a  spectral  line  obtained 
from  MEM.  This  shape  is  the  "Lorentzian"  distribution.  Since  the  latter  corre¬ 
sponds  to  the  shape  found  in  atomic  and  molecular  radiation,  the  coincidence 
appeared  to  Jaynes  as  magical.  Of  course,  there  is  no  magic.  This  result  is  an 
inherent  feature  of  the  Z-transform  formalism. 

The  Lorentzian  shape  of  a  spectral  line  results,  as  we  shall  see,  from  the 
location  of  a  "pole  near  the  unit  circle"  (which  relates  to  damped  sinusoidal  oscil¬ 
lations)  as  will  be  explained.  This  Lorentzian  feature  also  explains  another  very 
important  aspect  of  MEM  spectra.  Unlike  the  spectra  obtained  by  conventional 

methods  where  the  height  of  the  spectral  line  is  proportional  to  the  power,  the 
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MEM  approach  (see  Lacoss,  and  Burg's  Ph.  D.  thesis  )  results  in  a  line  that 
must  be  integrated  to  obtain  a  quantity  proportional  to  power.  Another  interesting 
distinction  is  that,  in  conventional  spectra,  the  resolution  of  the  spectrum  is 
determined  by  the  length  of  the  data  record  whereas  in  the  MEM-PSD  formalism 
It  is  determined  by  pole  position. 

4.1  Definition  of  the  Z-Tran<foim 

Consider  a  function  of  time  sampled  at  discrete  intervals,  At,  which  we  shall 
for  the  sake  of  convenience  take  as  unity.  (See  Figure  13.)  The  Z  transform  of 
this  time  function  is  obtained  by  multiplying  each  ordinate,  a^,  of  the  time  series 
by  successively  higher  powers  of  Z  as  in 

F(Z)  ^  a^^  Z'^  (60) 

n 

Iti.  Lacoss,  R.T.  ( 1971)  Data  Adaptive  Spectral  Analysis  Methods  in  Ref.  7, 

(p.  134),  of  Main  References. 

17.  Burg,  John  P.  (1975)  Maximum  Entropy  Spectral  Analysis,  Ph.  D.  Disser¬ 
tation.  Dept  of  GeopTiysics,  Stanford  University. 


where  in  Figure  13  we  have  F(t^)  =  a^.  Another  convention  consists  of  using 
negative  powers  of  Z  for  the  expansion,  as  mentioned  earlier  when  Fourier's 
contribution  was  discussed.  The  one  that  will  be  used  here  follows  that  given  by 
Laplace.  The  discrete  Fourier  Transform  is  obtained  from  Eq.  (60)  by  making 
the  substitution  Z  =  e^*^  =  e^*^  (since  At  =  1  here).  In  some  conventions  the 

negative  exponent  is  used  in  the  substitution.  (Robinson  and  Treitel,  for  example, 
use  this  and  it  will  be  employed  in  later  sections. ) 
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Figure  13.  Signal  Sampled  at  Discrete  Times.  At  =  1,  thus  times 
are  given  by  integers 


The  fact  that  the  coefficients  of  each  power  of  Z  in  the  expansion  [Eq.  (60)) 
give  the  Inverse  Fourier  Transformation  (when  Z  =  e^*^)  comes  as  a  surprise  to 
most  people.  This  is  because  in  ordinary  Fourier  Analysis,  the  inverse  is  ob¬ 
tained  by  calculating  an  integral  which  can  be  difficult  to  evaluate.  In  contrast, 
in  the  present  case  one  merely  needs  to  "point  to  a  coefficient".  Why  does  the 
formalism  operate  in  such  a  simple  manner? 


To  examine  the  formalism,  insert  Z 
F(z  =  e**^)  H  A(u) 


in  Eq.  (60)  and  call 


.«  •- 


V  :Sv-ivSv>:> 


rr’ 


The  inverse  is  defined  as  (from  Fourier  Transform  theory) 


00 

at  =  y*  A(u)  e'*“‘  du 
-00 


(62) 


Next  we  employ  a  key  observation.  It  is 

1  1  I  ■  (l.ifn  =  0 

X  f  z"  du  =  ^  f  e'"“  du  =  {  (63) 

^  J  ^  J  U.ifn=«=0 

-IT  -JT 

Notice  that  we  now  limit  our  integral  to  the  unit  circle.  This  is  because  we  are 
considering  discretized  functions  with  implied  Nyquist  limits.  The  function  sim¬ 
ply  repeats  outside  of  these  limits.  The  inverse,  a^,  then  becomes 

-ir 

Thus,  this  shows  us  one  of  the  reasons  why  the  Z  transform  brings  about  unex¬ 
pected  simplicities. 

A(w)  in  Eq.  (61)  can  be  viewed  in  an  alternative  manner  which  gives  it  an 
interpretation  familiar  to  people  in  the  field  of  optics.  If  Aiu)  were  constructed 
by  successively  adding  terms  in  Eq.  (61)  it  would  take  on  more  "structure"  as 
each  time  a  sample  is  added.  Such  a  build  up  of  a  spectrum  is  a  synthesis 

method  used  in  some  optical  instruments,  according  to  G.  Vanasse,  and  the  gen- 

1 8 

eral  concept  is  discussed  at  the  end  of  the  first  chapter  in  Bracewell. 

4.2  The  Z-Tranaform  Convolution  Theorem 

We  wish  to  show  that,  in  analogy  to  the  Fourier  Transform  theory  of  linear 

4  1^9 

systems  (see  Lee  or  Hsu  ),  there  is  a  convolution  theorem  for  the  Z-trans- 
formation:  "The  product  of  two  Z-transforms  is  equal  to  the  Z-transform  of  the 
convolution".  Figure  14  shows  a  block  diagram  for  a  filter,  B,  with  an  input  x 
and  output  y.  Let  B{Z)  be  the  Z-transform  of  the  impulse  resfwnse  of  the  filter 


18,  Bracewell,  R.  (1965)  The  Fourier  Transform  and  Its  Applications,  McGraw 

Hill,  Chapter  1. 

19.  Hsu,  H.  P.  (1970)  Fourier  Analysis,  Simon  and  Schuster  (Tech  Outlines). 
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Figure  14.  Input  X  Fnters  Filter  H.  and 
Fmerges  as  Output  Y 


and  let  V(Z)  and  X(Z)  be  the  Z-transforms  of  output  and  input  respectively.  Then, 
by  analogy  to  the  continuous  case,  we  have 


N 


y(z) 


X  Z 
n 


b  Z“ 
n 
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**  n 


n  =  0 


n=0 


n  =  0 


(G5) 


Consider  a  simple  case  where  the  filter*s  maximum  length,  N,  is  given  as 
N  =  2  (3  terms).  Then 

Y(Z)  =  (Xq  Z°  +  Xj  Z^  +  Xg  +  , , . )  <bQ  Z°  +  zW  bg  Z^ 

=  Xg  bo  Z°  F  (Xj  bo  +  Xo  b^)  +  (Xg  bo  +  bj  +  xo  bg)  Z2 

00 

+  (Xg  bo  F  Xg  b^  +  Xj  bg)  Z^  .  .  . .  z"  (66) 

n=0 

Now  identifying  the  coefficients  of  powers  of  Z  we  obtain 
2 

n-0 


or  in  general 
N 

*k-n  '^n 


167(b)] 


The  product  of  the  coefficients  of  powers  of  Z  gives  one  a  convolution  in  the 
coefficients.  This  is  an  interesting  fact.  The  reason  is  that,  for  each  power  of 
Z,  for  on  the  right,  there  must  be  a  sum  of  products  on  the  left  all  of  which 
involve  that  same  exponent,  n,  of  Z.  For  this  to  be  true,  each  left  member  of 
the  product  must  have  powers  that  decrease  as  the  right  member's  power  of  Z 
increases  so  that  their  sum  of  powers  is  equal  to  n.  This  is  another  reason  for 
the  Z-transform's  strange  simplicity. 

4.3  The  Wiener-Khiiitchine  Theorem 

The  reader  may  recall  that  this  theorem  states  that  the  Fourier  Transform 

4 

of  the  autocorrelation  is  the  PSD,  and  vice  versa  (see  Lee  .  )  We  now  examine 
the  Z-transform  version  of  this  since  it  is  part  of  the  MEM  procedure. 

Consider  B(Z)  above.  We  wish  to  obtain  the  Z-transform  version  of  a  PSD. 
The  PSD  is  |b(u)|^  thus,  using  Z  =  e*“ 


|b(u)1^  3  B((j)B(u)  =  ^  b^  ^  b^  =  B(Z)B(Z‘^) 


Thus  B(Z)B(1/Z)  is  the  Z-transform  version  of  the  PSD.  Notice  that  here  we 
have  been  allowing  b^  to  be  complex  for  generality.  From  this  point  on  b^  will  be 
considered  real.  In  this  case,  the  PSD  becomes  B(Z)B(1/Z).  We  define  the 


autocorrelation,  r  ,  as 
n 


^  Vk 


(Z 


Next  one  multiplies  out  B(Z)B(1/Z)  above  to  obtain 


B(Z)B( 


l/Z)  .  ^  r”  (z„t  f ) 

-N  n=0  ' 


To  see  why  this  is  true,  consider  again  the  three  term  filter.  Then 


B(Z)B(1;  Z)  ^  (bp  +  Z‘^  +  h.^  Z'^) 


^ -  +  (b^bp  +  b^b^  +  b2b2)Z 


(70b) 


+  (bpbj  +  bjb2)Z^  +  (bQb2)Z^ 


In  Lq.  (70)  one  substitutes  Z  =  e^*^  and  makes  use  of  the  trigonometric  identity 


2  cos  9  =  e^^  +  e 


(71) 


Thus 


B(Z)B(1/Z)  =  ^  r^l2  cos(n(j)) 


(72) 


n^O 


This  is  the  Wiener  Khintchine  relation  for  a  real  time  series  (compare  the  dis¬ 
cussion  of  the  Blackman-Tukey  approach  where  Eq,  (72)  is  used  to  get  the  PSD) . 
Again,  there  is  a  surprising  simplicity  here. 

*iunt 

4.4  The  Z-Transform  o(  e  ^ 

This  discussion  is  designed  to  serve  two  purposes:  (1)  to  give  one  a  ''feel'' 
for  the  Z-transform.  and  (2)  give  the  essential  explanation  of  the  MEM  Lorentzian 
line  shape.  The  left  side  of 


1  t  e 


Z  +  e 


*2i(an  Q  -3iu_  ^ 

Z‘^  e  e  ^  ^  = 


-IWn 


1  -  Z  e 


(73) 


represents  the  Z-transform  of  e  which  is  turned  "on"  a-  t  =  0  and  is  sampled 
at  intervals  of  At  ^  1.  (Such  a  sequence  is  called  a  "causal"  one,  since,  when  the 

concept  is  applied  to  a  filter's  impulse  response,  the  filter  that  results  operates 

20 

only  on  past,  and  not  future,  inputs.  See  Robinson  and  Treitel.  The  right  hand 
side  is  obtained  from  the  Bernoulli  or  geometric  expansion  used  in  reverse  (see 


20.  Robinson,  E.  A. ,  and  Treitel,  S.  (1980)  Geophysical  Signal  Analysis. 


Expanding,  completing  the  square,  and  using  2  cos  6  =  +  e 


A(Z)  A(l/Z) 


1  - 


(l  +  e) 


ri  ~  e)  “  cos(u-Uq)) 


(80) 


The  last  parentheses  can  be  rewritten,  by  using  a  trigonometric  identity,  as 

2 

9  (u  -  Wq)  (u)  -  Uq) 

2  sin"'  - 2 -  .  which  in  turn,  is  approximately  - 2 -  for  small  argument. 

Using  Eq.  (70),  we  get  the  PSD  from 


B(Z)B(1/Z)  =■ 


Thus. 


B(Z)B(1/Z)  = 


1 

1 

[  ^  1 

P  ,  4(1  +  e)  /“  ■  “o\  2 

A(Z)A(1/Z)  j 

^1  +  e 

(1  +  e)2  \  2  / 

(1  +  e)^ 

1 

e  +  4(1  +  e) 


+  (w  -  Uq) 


(81) 


(82) 


where  e  is  taken  as  very  small  for  the  right  side  of  the  approximation  sign.  This 
is  the  Lorentzian  shape  as  promised. 


5.  A  COMPARISON  BETWEEN  THE  AR  AND  MEM  SPECTRA 

There  are  two  purposes  for  this  comparison.  The  first  is  to  give  the  reader 
an  intuitive  interpretation  of  the  MEM  PSD  given  in  Eq.  (59).  The  form  of  the 
latter  is  always  strange  to  someone  who  is  familiar  with  the  conventional  approach 
but  not  with  the  MEM  formalism.  The  second  purpose  is  to  provide  background 
information  to  be  used  below  in  the  discussion  of  the  calculation  of  the  unknown 

coefficients,  X,  ,  or  their  equivalent,  in  the  MEM  PSD,  In  the  following  part  of 

^  20 
this  section,  the  approaches  of  Robinson  and  Treitel  pp.  (401-405),  and 

2 1 

pp.  (444-446),  as  well  as  Robinson  pp.  235-241  will  be  employed. 

First,  recall  the  discussion  of  Yule's  work  on  auto-regressive  (AR)  analysis. 
From  Eq.  (14),  (using  y  instead  of  x  and  switching  to  subscripts) 


21.  Robinson,  E.A.  (1967)  Statistical  Communication  and  Detection,  Hafner. 


n=l 

represented  the  AR  process  and  e,  represented  noise  with  mean  equal  to  zero,  and 
2  * 

variance  equal  to  a  .  The  Z  transformation  of  Eq.  (83)  is 


Y(Z) 


=  e(Z) 


(84) 


or,  with  ajj  s  1, 


Y(Z)  = 


e(2) 


m  =  0 


(85) 


or 


Y(2) 


e(Z) 

■  A(zy 


(86) 


The  PSD  is  obtained  from  Y(Z)  Y(l/Z)  and  Z  =  e  (where  we  have  changed  our 
convention) 


|y,„,P  .  JtlElL  - 


(87) 


(At  s  1  here)  where  use  was  made  of  Eq.  (17)  and  the  fact  that  the  variance  of 
2 

e(Z)  is  a  .  Equation  (87)  is  the  basic  AR  PSD,  The  MEM  PSD,  rewritten  here 
for  comparison,  is 


The  change  from  z  =  e'“  to  z  =  e  merely  changes  the  sign  of  the  complex 
exponential  in  the  definition  of  the  Fourier  Transform  and  its  inverse.  There 
is  no  fixed  convention  in  the  literature. 


W^rntm^m  jm  p  i 


l4>(u)|^  =  - - -  (88) 

^-iu(kAt) 

k=  -m 

Notice  the  similarity  in  form  between  Eq.  (88)  and  the  expansion  of  Eq.  (87) 

(see  Eq.  (85)].  It  will  be  explained  below  that  they  are  identical.  In  any  case, 
the  derivation  of  Eq.  (87)  given  here,  gives  another,  perhaps  more  intuitively 
satisfying,  derivation  of  this  form. 


6.  THE  TOEPLITZ  EQUATIONS  FOR  THE  MEM  PSD  “COEFFICIENTS" 

The  constants,  in  Eqs.  (87)  or  (88)  remain  to  be  determined.  The  dis¬ 
cussions  given  below  and  in  the  next  section  (Section  7)  are  devoted  exclusively 
to  this  subject.  Section  6  derives  the  main  equations  (Toeplitz)  which  relate 
the  coefficients  to  the  autocorrelations  of  the  data,  and  Section  7  describes  the 
ingenious  technique  for  the  solution  of  these  equations. 

6.1  Derivation  of  the  Toeplitz  Equations 

According  to  a  theorem  by  Fejer  (which  will  be  discussed  more  in  Section  6.  2 

21 

and  which  is  discussed  at  length  in  Robinson,  p.  235,  one  can  write  the  right 
hand  side  of  Eq,  (88)  (in  Z-transform  form)  as 


<I>(Z)  = 


_ m 

aJzTaJUz)  ■ 


(89)* 


where 

Aj„(Z)  s  1  +  ajZ+  ...  +  a^  Z'"  .  (90) 

and  the  coefficients,  a^,  are  presumed  to  be  real.  Here,  A^(Z)  is  called  the 

prediction  error  (P.  E. )  filter.  This  is  a  reasonable  term  to  use  since,  in 

Eqs.  (84)  and  (85),  t(Z)  is  the  prediction  error.  Further  explanation  will  be  given 

below  when  Wiener's  theory  of  linear  prediction  is  discussed.  Returning  to 

Eq.  (89)  and  the  E-ejer  Theorem,  the  latter  shows  that  A„(Z)  and  A„(l/Z)  can 
^  ■'  mm' 


In  the  treatment  the  convention  At 

numerator  would  have  been  <j^  At, 

m 


1  is  employed.  If  it  had  not  been,  the 


36 


put  into  a  form  which  is  called  "factored".  "Factored"  means  that  one  can 
arrange  all  the  roots  of  A„(Z)  and  A„(l/Z)  in  a  manner  that  A„(Z)  has  all  of  its 
roots  outside  of  the  unit  circle  and  A^(l/Z)  has  all  of  its  roots  inside.  This  will 
be  discussed  more  in  Section  6.2.  (The  reader  will  recall  the  discussion  of  the 
single  MEM  spectral  line  and  how  important  the  position  of  a  root  can  be  for  sta¬ 
bility.  )  Unfortunately  it  is  not  possible  to  give  a  comprehensive  discussion  of 

the  concept  of  "minimum  phase"  in  this  report.  (For  this  the  reader  may  consult 

22  20  3 

Robinson  and  Silvia.  Robinson  and  Treitel  and  Claerbout.  )  Nevertheless,  it 

should  be  mentioned  that,  when  the  "factorization"  is  carried  out.  A  (Z)  is  called 

’  m 

"minimum  phase"  or  "minimum  delay"  (the  two  terms  are  synonymous)  and,  in 
contrast.  A^(l/Z)  is  called  "maximum  delay"  or  "maximum  phase". 

The  next  step  in  the  derivation  consists  of  multiplying  both  sides  of  Eq.  (89) 
by  A(Z)  giving 

*<2).VZ)  .  (91) 

The  left  hand  side  is  the  product  of  the  Z  transforms  of  *  and  A.  It  was  shown 

dbr  ve  that  this  is  identical  to  the  Z  transform  of  the  convolution  of  *  and  A. 

2 

Novs  .  onsider  the  right  hand  side  of  Eq.  (91).  We  have  a  constant  or  divided  by 
■\(1  Z).  which  IS  a  polynomial  with  all  of  its  zeroes  inside  of  the  unit  circle.  As 
will  he  shown  in  Section  6.  2.  (A(l/Z)]  is  stable  in  the  expansion  of  powers  of 
/.  The  next  step  is  to  use  the  Wiener -Khintchine  Theorem  in  its  Z-Transform 
form 


«(Z)  ^  (92) 

-00 

where  are  the  autocorrelations.  First,  we  expand  the  inverse  of  A(l/Z)  on  the 
right  hand  side  of  Eq.  (91)  to  obtain 

*(Z)  A(Z)  =  0^(1  (terms  in  neg.  powers  of  Z)]  (93) 

and  then  use  Eq.  (92)  to  replace  «(Z)  on  the  left  side  of  Eq.  (93).  Next  we  equate 
the  coefficients  of  like  powers  of  Z  on  both  side  of  Eq.  (93)  remembering  that 


22.  Robinson,  E.A, ,  and  Silvia,  M.  T,  (1978)  Digital  Signal  Processing  and  Time 
Series  Analysis,  Holden  Day,  Inc. 


6  =6  for  real  series.  It  will  not  be  done  here,  but  it  is  simple  to  carry  out. 

The  result  is  m  +  1  simultaneous  linear  equations  that  can  be  written  in  matrix 
form  as 


•^1  •••  *m-l  '^m 


m  ^m-1  •  •  •  '*'1 


Equation  (94)  is  the  fundamental  equation  that  gives  the  values  of  and  in 
the  MEM  PSD  (Eq.  (88)1 .  The  matrix  on  the  left  side  is  called  a  Toeplitz  matrix. 

Toeplitz  matrix  has  matrix  elements  t^^  that  are  a  function  of  (i  -  j).  This  rela¬ 
tionship,  which  makes  all  the  elements  with  a  constant  difference  between  the 
indexes  identical,  gives  a  Toeplitz  matrix  a  characteristic  "banded"  appearance, 
because  all  terms  in  the  upper-left -to-lower-right  diagonals  are  equal. 

Because  our  autocorrelations  are  even  functions,  the  matrix  representation 
will  be  a  symmetric  matrix,  so  that  this  matrix  not  only  has  the  "banded"  appear¬ 
ance  characteristic  of  a  Toeplitz  matrix,  but  is  also  symmetric.  Another  way  to 
see  it  is  to  realize  that  the  bottom  row  is  the  reverse  of  the  top  row.  and  the 
second  from  the  bottom  is  the  reverse  of  the  second  from  the  top,  and  so  on.  A 
Toeplitz  matrix  gives  rise  to  the  "Levinson  Recursion"  to  be  described  in 
Section  7. 

There  are  two  names  given  in  the  literature  for  Eq.  (94).  On  one  hand  they 
(except  for  the  top  one)  are  called  the  "normal  equations".  As  will  be  shown, 

Eq.  (94)  can  be  derived  from  a  least  squares  approach  to  prediction;  and  "least 
squares"  always  has  its  "normal  equations"  as  is  well  known.  The  second  name 
for  Eq,  (94)  is  the  "Yule-Walker  equations"  after  the  two  authors  who  derived  it 
for  their  work  in  AR. 

One  other  comment  should  be  added  regarding  the  right  hand  side  of  Eq.  (94) 

which  consists  of  nothing  but  zeroes  with  the  exception  of  the  first  member  which 
2 

is  This  structure  arose  because  there  were  no  positive  powers  of  Z  on  the 

right  side  of  Eq.  (93).  This  is  an  unusual  situation  to  a  person  schooled  in  the 

continuous  formalism.  However,  there  is  a  familiar  analogue  of  this,  the  treat- 
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ment  of  causal  and  anti -causal  filters.  (For  example,  see  Guillemin,  )  There, 

23.  Guillemin,  E.  A.  (1949)  The  Mathematics  of  Circuit  Analysis.  Wiley  and 
Sons;  (1963)  Theory  of  Linear  Physical  Systems,  Wiley  and  Sons. 
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the  analogue  to  the  above  treatment  is  accomplished  by  means  of  contour  integra¬ 
tions.  These  contours  either  contain  singularities  or  they  do  not.  When  they  do 
not,  the  integrals  vanish,  and  vice  versa.  This  underlies  for  example,  the  theory 
of  Hilbert  Transforms  for  causal  functions.  We  have  here  yet  another  simplifica- 

4 

tion  due  to  the  Z-transform.  (The  interested  reader  should  also  consult  Lee's 

20 

treatment  of  spectral  factorization  in  the  continuous  domain.  Robinson  &  Treitel 
also  discuss  spectral  factorization, 

6.2  Comments  on  Factorization  and  Minimum  Phase 

The  Fejer  Theorem  was  used  to  get  the  essential  result  [Eq.  (94)].  The 
reader  may  want  to  know  a  little  more  of  what  lies  behind  it.  The  key  elements 
will  be  described  and  references  will  be  provided  for  the  full  treatment.  The 
first  important  fact  is  that  the  filter  represented  by  A^(Z)  is  causal.  It  starts 
at  t  =  0  and  goes  into  the  future.  During  convolution,  it  therefore  acts  exclusively 
upon  the  past  inputs.  Let  us  now  factor  it  in  the  algebraic  sense  [not  in  the  sense 
of  "factoring”  the  quantity  A(Z)  A(l/Z)  used  in  the  previous  section] .  This  gives 


»  ( 


t  1 


I  4 


1...,' 


To  understand  a  little  more  about  "minimum  phase"  and  "maximum  phase",  let 
us  take  the  simplest  case  of  Eq,  (95)  where  only  one  root  is  involved.  The  gen¬ 
eralization  from  this  case  will  not  introduce  any  new  concepts.  The  first  question 
that  arises  is.  When  is  [A(Z)j  stable?  Using  the  simple  case 


*  •.  •  •  •  4 


A(Z)  =  (bp  +  b^Z)  . 


we  obtain  for  the  inverse 


'  »* 

k, _ 


•(l+a'Z  +  (oZ)  +  ...) 


where  the  root  Z^  =  i  and  o  s  (-b^/bp).  Thus  it  is  obvious  that  an  expansion  in 
positive  powers  of  Z  will  be  stable  whenever  o  <  1,  which  means  Zq  must  be  out¬ 
side  of  the  unit  circle  and,  by  a  definition  given  above,  A(Z)  is  minimum  phase. 

"Invertability"  and  "minimum  delay"  or  "minimum  phase"  are  identical  in  their 
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meaning  (p.  157,  Robinson  and  Silvia  ), 
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The  next  question  is.  when  is  [A*(l/Z))  ^  stable  in  an  expansion  of  negative 
powers  of  Z  ?  The  asterisk  denotes  that  we  take  the  complex  coniugi^tp.  The 
reason  that  we  have  switched  to  the  general,  complex  case  is  that  all  treatments 
of  the  Fe.ier  Theorem  (for  example.  Robinson^^)  use  this  form.  Here  one  has 


AHljZ) 


(1  +  (a'Z)' 


(97b) 


The  root  Z_ 


1/a',  o'  =  (-bQ/bj):  and.  the  expansion  converges  only  when  a  >  1 


or  Zq  is  now  inside  of  the  unit  circle.  The  generalization  of  this  was  used  to 
derive  Eq.  (94). 

The  "factorization"  of  A(Z)  A*(l/Z)  reduces  in  essence  to  the  situation  shown 

jjc 

in  Figure  15.  Here  the  two  roots  Z.  and  l/Z^  are  shown,  one  inside  and  one  out¬ 
side  of  the  unit  circle.  In  the  general  case  (Eq.  (95)1.  one  can  arrange  to  have 
all  the  roots  of  1a(Z)|^  into  two  categories,  namely,  those  which  are  inside  and 
those  which  are  outside  of  the  unit  circle.  In  general,  if  one  is  given  only 
I  A(Z)|  then  there  would  be  numerous  ways  of  assigning  the  roots  of  the  compo¬ 
nent  parts.  A(Z).  and  A*(l/Z).  Thus,  |  A(Z)|  ^  would  remain  the  same  no  matter 
how  these  roots  are  assigned.  On  the  other  hand,  making  A(Z)  minimum  delay 
and  A*(l/Z)  maximum  delay  represents  a  unique  assignment  with  desirable 

stability  properties.  The  reader  who  wishes  to  study  the  general  treatment  may 

2 1 

consult  pp.  235-238  of  Robinson  ;  however,  the  above  gives  the  main  ideas  for 
concepts  involved. 

The  concepts  of  minimum  phase  and  spectral  factorization  have  many  impli- 

3 

cations  in  data  analysis.  Chapter  3  of  Claerbout  explains  these  concepts  in  six 
different  ways.  The  above  represents  merely  one  of  them.  It  is  also  discussed 
in  Section  8.3  of  Robinson  and  Silvia  as  well  as  in  Ch.  3  and  Ch.  7  of  their  book. 
In  Ch.  7,  the  so-called  Payley-Wiener  condition  is  described  in  the  clearest 
manner  known  to  the  present  author.  This  condition  is  not  only  a  key  concept 
behind  the  minimum  phase  property  but  it  is  a  condition  which,  when  violated, 
leads  to  a  "deterministic"  time  series.  This  last  point  is  mentioned  for  a  very 
specific  reason  here  (despite  space  limitations).  The  conceptual  foundations 
behind  the  MEM  PSD  rule  out  deterministic  signals.  The  reader  can  understand 
this  when  it  is  pointed  out  that  the  model  for  the  signal  is  the  model  consisting  of 
filtered  white  noise  (or  the  "peas  and  the  pendulum  model").  A  signal  containing, 
for  example,  a  sine  wave  (along  with  everything  else  such  as  filtered  white  noise) 
will  violate  our  model.  It  will  violate  the  Payley-Wiener  condition.  A  MEM 
spectrum  of  such  a  signal  could,  and  should,  give  us  nonsense. 


UNIT  CIRCLE 


I  Z  PLANE 

Figure  15.  Illustration  of  Factorization  of 
Roots  Zj  Into  Minimum  Phase  and  Maximiun 
Phase  Pairs 


fn  view  of  this,  then,  it  is  somewhat  surprising  that  many  of  the  "tests"  of 
the  MEM  PSD  method  to  be  found  in  the  literature  use  sine  waves.  If  the  MEM 
method  is  discredited  by  such  tests,  then,  one  may  perhaps  regard  such  tests  as 
invalid,  and  it  seems  that  they  should  be  ignored.  After  all,  why  should  this 
technique  be  required  to  work  under  the  very  conditions  that  violate  its  assump¬ 
tions  ? 


7.  LEVINSON  RECURSION  (AS  MODIFIED  BY  ROBINSON  AND  WIGGINS) 

This  technique  is  used  to  solve  the  Toeplitz  Equations,  (94).  It  was  first 

developed  by  Levinson  in  order  to  solve  Eq.  (94)  when  it  appeared  in  linear  pre- 
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diction,  as  described  in  Wieners'  book  on  time  series,  Levinson's  work  is 

reprinted  in  that  book  in  the  form  of  appendixes.  Robinson,  in  his  IEEE  review 

article,  has  pointed  out  that  these  papers  of  Levinson  are  masterpieces  of 

explanation  and  that  they  made  it  possible  for  the  engineering  community  to 

employ  Wiener's  work  on  time  series. 

This  technique  of  recursion  is  a  basic  part  of  all  computer  programs  for 

calculating  MEM  PSD's.  It  depends,  as  was  mentioned  previously,  on  the 

2 

Toeplitz  symmetry:  and  the  purpose  is  to  obtain  and  the  a.  in  Eq.  (94),  In 


sfe 

However,  this  issue  is  not  resolved  at  this  time.  The  fact  that  any  sine  wave  is 
in  practice,  finite  may  make  the  Paley-Wiener  condition  not  relevant.  On  the 
other  hand,  Gottman  says  that  the  deterministic  signals  must  first  be  removed 
in  order  that  there  be  a  valid  application  of  these  methods, 

24,  Wiener,  N.  (1949)  Extrapolation,  Interpolation,  and  Smoothing  of  Stationary 
Time  Series,  Cambrit^e,  MA,  Technology  Press  of  the  MIT. 


order  to  explain  the  procedure,  we  shall  commence  with  a  3  X  3  version  of 
Eq.  (94)  and  then  show  how  to  proceed  to  the  4X4  case.  Consider 
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Here  the  Og  is  replaced  by  P2  for  convenience.  The  superscript  2  on  the  a. 
identify  them  as  being  associated  with  a  PE  filter  with  ^2  ss  the  highest 
coefficient. 

One  now  takes  advantage  of  the  symmetry  (the  bottom  row  is  the  reverse  of 
the  top  row  of  the  matrix,  and  so  on).  To  extend  Eq.  (98)  to  a  4  X  4  matrix,  we 
write 
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where  ^  is  to  be  regarded  as  a  discrepancy.  Prom  Eq.  (99) 
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where  a  =  1.  Now  consider  the  4th  order  filter  in 
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(101) 


Next,  we  use  the  "trick",  based  on  the  symmetry  (which  allows  one  to  reverse 
the  vectors  without  reversing  the  matrix)  to  rewrite  Eq.  (99)  as 


As  can  be  seen,  it  is  the  Toeplitz  symmetry  which  insures  that  Eq,  (102)  and 
Eq.  (99)  are  exactly  the  same  equation.  Now  let  us  add  to  Eq.  (99)  a  certain 
fraction  of  Eq.  (102),  given  by  Cg,  and  try  to  reproduce  Eq.  (101). 


Our  unknowns  are  Pg  and  Cg  on  the  right  side, 
vanish,  we  have, 

^  +  Cg  Pg  =  0  . 

or 


To  make  the  bottom  element 

(104) 

(105) 


To  obtain  agreement  for  the  top  element  on  the  right. 


(1 
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where  Eq.  (105)  was  used.  It  must  be  recalled  that  Ag  is  defined  by  Eq.  (100), 
that  is, 
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Similarly,  using  Eq.  (101)  in  connection  with  Eq.  (103)  (left  side) 
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The  above  recursion  from  the  3  X  3  to  the  4X4  case  can  be  generalized  in  an 
obvious  way  to 
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(s  =  0,  1 . N)  ,  and  aj^^  -  C 


The  recursion  is  initiated  by  (using  a  1  by  1  matrix  and  the  general  formulas) 
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(112) 
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A  treatment  is  given  in  Haykin  (Section  2.4)  that  parallels  this  ‘reatment  and 
which  the  reader  may  consult.  The  quantities  Cj^  play  a  crucial  ole  in  what  is 
to  follow  below.  They  are  called  "reflection  coefficients". 


8.  WIENER  PREDICTION  AND  LEAST  SQUARES 

20 

The  following  treatment  follows  Robinson  and  Treitel.  The  original  work  is 

in  Wiener's  "Time  Series",  and  explanations  of  the  continuous  version  are 
4  25  26 

given  in  Lee,  Laning  and  Battin  and  in  Bendat.  The  present  treatment  will 
serve  a  number  of  purposes:  (1)  it  will  give  an  alternative  way  to  understand 
Eq.  (94).  that  is,  the  Toeplitz  Equations,  (2)  it  will  explain  better  why  A^(Z)  is 
called  the  "prediction  error"  (PE)  filter,  (3)  it  will  illustrate  why  the  MEM  PE 
filter  whitens  the  spectrum  until,  at  some  order  of  A^(2),  the  PSD  is  flat, 

(4)  by  showing  that  both  MEM  and  least  squares  lead  to  Eq.  (94)  one  can  convince 
oneself  that  the  objections  of  Shimony  et  al  to  MEM  do  not  apply  to  the  MEM  PSD 
as  we  know  it,  and  (5)  it  will  provide  basic  information  needed  in  order  to  under¬ 
stand  the  so-called  "Burg  Technique"  to  be  explained  in  Section  10. 


25.  Laning,  J.H, ,  and  Battin,  R.H.  (1956)  Random  Process  in  Automatic  Control. 

McGraw-Hill, 

26,  Bendat,  J.S.  (1958)  Principles  and  Applications  of  Random  Noise  Theory, 

Wiley  &  Sons, 


Figure  16  shows  a  block  diagram  that  describes  the  problem  to  be  solved. 

The  input,  x^.,  is  given.  We  wish  to  obtain  predicted  values  of  x^.  at  a  time  a 

later,  or.  in  other  words,  we  want  x. ,  =  2  .  To  accomplish  this,  one  has  a 

t+o  t 

filter  described  in  terms  of  its  weighting  coefficients,  f^.  This  filter  is  to  be 
designed  so  that  its  output  will  be  as  close  as  possible  to  Zj- 


Figure  16.  Prediction  Filter 


To  design  the  filter,  Wiener  chose  to  use  a  "least  square  error"  criterion, 
and  this  is  the  basis  of  the  entire  procedure  of  the  present  section.  The  mean 
square  error  is  given  by  J,  defined  by 


J  s  izj  -  y^)2  (113) 

One  now  applies  the  convolution  theorem  to  describe  the  effect  of  the  filter  in 
Figure  16. 
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(114) 


where  the  asterisk  indicates  convolution.  We  will  use  Eq.  (114)  in  Eq.  (113)  for 
y^  in  the  following.  To  obtain  the  optimum  values  of  T,  one  minimizes  J  by  setting 
the  derivative  of  J  with  respect  to  each  f.  equal  to  zero: 

|f--0  ,  (115) 

^  i 

Before  carrying  this  out,  we  define  the  cross  correlation,  0  (t),  by 

xy 

^xy<^>  ^  *t+T  yt  • 


(116) 


W  V 1 


If  Kq.  (115)lwith  Kq.  (114)  and  Kq.  (113)  substituted  for  and  J)  is  carried  out 
with  i  -  1,  and  if  one  makes  use  of  the  definition  (F'q.  (116)]  for  one  obtains 
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T=0 


(117) 


Here  we  used 

^xx^^  - 

For  the  general  case  of  3J/0F  =  0  one  obtains 

m 

2  ‘^xx^-i  ■  •  <J  =  0.  1 . m)  •  (119) 

T=0 


This  equation  is  the  discrete  version  of  the  Wiener-Hopf  Equation  (compare  with 
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Lee  and  Haykin  )  and  it  constitutes  the  "normal  equations”  of  the  least  square 
formulation  here. 

Eq.  (119)  is  also  used  to  give  a  more  convenient  expression  for  J,  namely 


J  =  <(i,,(0) 
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zx 


T  =  0 


(120) 


[Details  of  the  algebraic  manipulations  left  out  above  will  be  found  in  Robinson  and 
T reitel,  p.  147.  1 

We  now  specialize  to  the  one  step  predictor,  or 


t+1 


(121) 


In  other  words,  we  now  wish  to  predict  one  unit  of  time  in  the  future,  that  is, 
x^  is  the  input  and  we  want  as  the  output  of  the  filter.  For  this  case  we  have 

*zx<)’  =  +  1)  .  (122) 


-r 


47 


and  the  normal  equations,  Eqs,  (117),  become 


Thus  Eq.  (123)  is  the  Wiener-Hopf  one  step  predictor  equation  which,  when  solved, 
gives  the  optimum  coefficients,  T. 

At  this  point  we  shall  change  the  notation  to  the  "prediction  error"  notation 
for  reasons  which  will  become  apparent.  This  is  done  as  follows 
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where  N  s  (m  +  1),  that  is,  we  now  have  an  Nth  order  PE  filter. 
In  terms  of  this  notation,  Eqs,  (123)  become 


ill 

T=0 

We  now  define  p  =  t  +  1  and  multiply  Eq,  (125)  through  by  (-1)  obtaining 
N 

^xx<j  +  1  -  W)  =  0  ,  (j  =  0,  1,  ,  ,  ,  N-1)  , 

p=0 

In  the  new  notation  we  have  Eq,  (120)  in  the  form 
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If  one  combines  Eq.  (127)  with  Eq.  (126)  one  obtains  the  previous  Eqs.  (94): 
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Previously  we  used  ^  (j)  - 

This,  Eq.  (128),  is  a  very  interesting  result  because  we  now  see  that  it  can 
be  obtained  from  least  squares  analysis  without  any  help  from  the  "higher  prin¬ 
ciple"  of  maximum  entropy.  This  implies  that  MEM  and  least  squares  have  a  deep 
connection.  To  this  writer's  knowledge,  this  connection  has  not  yet  received  an 
explanation. 

At  this  point,  let  us  reconsider  the  question  of  why  we  have  been  calling  the 
Sj  in  Eq,  (124)  the  PE  filter  as  contrasted  to  the  Wiener  prediction  filter.  The 
latter  is  given  by  (one  step) 


yt  ^  *t+i  -  2  ^k  *t-k+i 
k=l 


(129) 


This  is  modeled  in  Figure  17a  by  the  part  of  the  system  within  the  dotted  lines. 
In  contrast,  there  is  the  prediction  error  filter.  This  is  given  by 
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(130) 
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This  is  represented  by  the  whole  of  Figure  17a  or  its  re -representation  in  Figure 
17b.  Equation  (130)  explains  the  change  of  notation  given  in  Eq.  (124).  The  out¬ 
put  of  the  PE  filter  is  the  error,  and  when  the  error  is  white,  the  filter  has 
embodied  all  the  predictable  parts  of  the  input  signal.  In  effect,  the  predictor 
filter  represents  the  inverse  of  the  filter  which  is  used  in  the  signal  model  as 
depicted  in  Figure  4, 


t. 


V  y 


PREDICTIVE 

CM  TCD 


DELAY 


Figure  17a.  Wiener  Filter,  f. 
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Figure  17b.  P.  E.  Filter, 
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Lee  (p,  410)  gave  a  very  useful  physical  description  of  the  Wiener  prediction 
filter.  (Note  that  the  general  form  of  Wiener's  Optimum  Filter  did  more  than 
predict.  It  also  did  the  best  job  possible,  in  a  sense,  to  get  rid  of  noise.)  Fig¬ 
ure  18  gives  this  description.  At  the  top  of  the  figure  is  given  the  "model”  for  the 
signal  (or  data).  It  consists  of  a  random  pulse  generator  which  serves  as  input  to 
a  filter  or  "pulse  shape  network"  (they  are  the  same  thing).  Note  the  impulse 
response  drawn  on  the  filter  block.  Up  to  this  point  we  have  what  was  in  Figure  4. 
but  with  more  detail.  Next,  we  consider  the  Wiener-prediction  filter.  The  signal 
first  enters  the  pulse  shape  inverter,  or  the  inverse  of  the  filter  used  in  the  sig¬ 
nal  model.  The  effect  of  this  first  filter  is  to  take  signal  inputted  to  the  inverter 
and  convert  it  back  into  the  original  pulse  noise  of  the  signal  model  (that  is,  the 
output  of  the  top  left  block).  These  pulses  are  then  fed  to  a  new  filter  or  pulse 
shape  network  that  differs  from  the  one  of  the  signal  model  in  only  one  respect. 
The  initial  part  of  the  impulse  response  is  missing,  and  the  amount  missing 
corresponds  to  the  time  elapsed  in  the  predictive  time  interval  which  we  have  been 
taking  as  one  time  step.  Notice  that  if  one  were  to  try  to  predict  too  far  into  the 
future,  the  impulse  response  might  be  too  small  to  be  of  any  value.  Eventually 
it  would  die,  and  the  predictor  would  predict  the  value  zero. 

The  above  physical  description  seems  to  be  quite  helpful.  The  reader  should 
consult  Lee^  to  see  the  proof  that  the  above  is  correct.  What  is  most  remarkable, 
in  the  opinion  of  the  present  writer,  is  the  fact  that  such  an  ingenious  "device" 
was  automatically  "invented"  by  "least  squares"  mathematics.  As  we  have  seen, 
maximum  entropy  accomplishes  the  same  thing. 
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Figure  18.  Y.W.  Lee's  Diagram  of  the  Prediction  Filter.  Note  that  the 
prediction  is  zero  for  times  in  the  future  that  exceed  the  duration  of  the 
impulse  response 


9.  GOOD  AND  BAD  CANDIDATES  FOR  MEM 


Not  all  data  lend  themselves  favorably  to  MEM-PSD  analysis.  There  are 
three  main  situations  which  make  a  specific  set  of  data  a  bad  candidate:  (1)  The 
MEM  signal  model,  that  is.  the  AR  model,  does  not  correspond  to  the  data  source. 
(As  will  be  shown,  there  are  two  other  models.)  (2)  The  data  are  too  noisy. 
(MEM-PSD  analysis  is  especially  vulnerable  to  noise.)  And  (3),  the  autocorrela¬ 
tion  has  decayed  to  the  point  where  no  statistically  reasonable  extension  of  it  will 
be  above  the  noise  level.  Case  (2),  as  will  be  shown,  is  closely  related  to  (1). 
Case  (3)  is  true  because,  as  Burg  has  shown,  one  way  to  understand  how  MEM 
"works"  is  to  think  of  it  as  a  way  to  extend  the  autocorrelation  from  the  given 
values  to  all  the  values  that  lie  outside  the  data  range.  In  other  words,  the  linear 
predictor  implied  in  the  MEM  analysis  is,  in  effect,  used  lo  extend  the  autocorrela 
tion  from  -oo  to  +« .  If  such  prediction  yields  nothing  new,  then  nothing  will  be 
gained  by  MEM  analysis  over  the  conventional  approach, 

9.1  The  Various  Signal  Models 
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In  the  following  we  shall  follow  Robinson  and  Treitel,  Ch  16. 

9.  1.  1  THE  AR  MODEL 

As  we  have  seen,  where  x^  is  input  and  y^  is  output,  the  model  is  given  by 


“i  ■^t-i 
i=l 


The  model  uses  white  noise  as  input,  x^.  In  the  PE  form,  it  is 
N  N 

i=l  i=0 

where  a^  =  1.  Taking  the  Z-transform  of  Eq.  (132)  yields 
Y(Z)  A(Z)  =  X(Z)  , 
where 


(132) 


(133) 


N 


A(Z) 


■L 

i  =  0 


(134) 


We  used  the  convolution  theorem  to  get  Eq.  (133).  Now  divide  both  sides  of  Eq. 
(133)  by  A(Z), 

Y(Z)  =  .  (135) 


From  this  one  can  get  the  PSD  version,  along  the  lines  of  our  previous  discussions, 
by  means  of  |a(Z)|^  s  A(Z)A*  (1/Z)  and  so  forth,  and  arrive  at 


|y(Z)1^ 


|X(Z)|^  (Z  =  e'^*^)  |x((j)P 

|a(Z)|2  1a(u)P 


(136) 


The  above  is  somewhat  in  the  category  of  "review"  material  and  we  have  seen  that 
the  poles  [or  zeroes  of  A(z)]  give  us  the  "lines"  in  the  spectra.  In  this  connection 
the  reader  should  recall  the  discussion  of  the  basic  "MEM  line"  and  its  Lorentzian 
property. 


9.1.2  THE  MA  MODEL 


Let  us  now  consider  a  new  model.  This  is  related  to  the  previous  discussion 
of  the  Slutzky  Effect.  It  is  called  the  moving  average  (MA)  model  which  is  given 

by 


yt 


N 


'^n  *t-n 

n=0 


(137) 


Here  it  is  the  input  values  that  are  weight  averaged  by  the  filter.  (AR  corresponds 
to  "feedback"  and  MA  to  "feedforward"  if  one  wishes  to  use  engineering  termi¬ 
nology).  As  before,  one  takes  the  Z-transform  of  Eq.  (137); 

Y(Z)  =  B(Z)  X(Z)  .  (138) 

and  obtains,  going  through  the  same  manipulations  as  above: 

1y(u)P  =  |b(u)P  |x(u)P  .  (139) 

Here  it  is  the  shape  of  |B(u)i^  that  gives  the  spectrum  its  form  since  lx(u)l^  is 
"white"  or  constant.  The  Blackman  and  Tukey  approach  is  "MA", 

9.1.3  THE  ARMA  MODEL 

This  model  is  a  combination  of  the  above  two  models.  In  this  case  the  model 
is  given  by 

N  M 

yt  =  2  ^^-n-2^^yt-i  • 

n=0  i=l 


and,  for  obvious  reasons. 


|y(u))P  =  lx(u)P  .  (14 

|A(w)r 

Makhoul's  tutorial^"^  calls  this  a  "pole-zero  model"  because  |b(w)1^  can  supply 
zeroes,  and  the  zeroes  of  1a(u))1^  give  poles  since  A  is  in  the  denominator.  He 
also  discusses  the  AR  and  MA  models. 

27,  Makhoul,  J. ,  "Linear  Prediction;  A  Tutorial  Review",  Ref,  7  of  Main  Ref. 
p,  99. 


At  this  point,  Robinson  and  Treitel  make  a  very  important  observation. 

First  they  made  computer  simulations  based  on  the  AR  model,  the  MA  model,  and 
the  ARMA  model.  Then  they  obtained  the  PSD's  for  the  three  cases  based  on  the 
known  signal  models.  Next  they  used  procedures  to  obtain  the  PSD's  from  the 
data  themselves.  The  procedures  were  based,  in  each  case,  on  the  three  different 
models,  that  is,  MEM  for  AR,  Blackman  and  Tukey  for  MA,  and  a  procedure 
known  as  "Box  Jenkins"  for  ARMA,  Figure  19  shows  their  results.  In  each  case 
where  the  simulation  model  matched  the  method  of  analysis,  the  match  is  perfect 
or  very  good.  In  the  other  cases,  the  results  could  be  invalid.  In  particular,  it 
should  be  pointed  out  that  AR  gave  a  narrow  line  whereas  the  MA  treatment  gave 
the  correct  shape  in  the  case  where  the  signal  was  generated  by  a  MA  process. 
Thus,  we  see  that  AR  could  give  a  false  "high  resolution"  in  this  case. 

MEM  works  best  when  the  AR  model  fits  the  data.  Fortunately,  there  is  some 
relief  from  this  problem.  A  theorem  has  been  proven  in  the  literature  which  says 
that  all  the  methods  converge  to  the  same  PSD  result  provided  that  sufficiently 
high  order  is  used.  It  may  therefore  be  much  more  practical  to  use  MEM  for 
certain  types  of  ARMA  case,  since  the  latter  is  relatively  difficult  to  carry  out. 


9.2  Noiac 
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As  Kay  and  Marple  have  pointed  out  most  clearly  in  their  excellent  review 
article,  noise  is  very  effective  in  corrupting  MEM  spectra  because  it  has  the 
effect  of  changing  the  signal  model.  To  see  this,  let  y^  be  a  noise  corrupted  AR 
process  called  x^.  Then 


n  n 


(142) 


where  w  represents  the  white  noise  of  variance  a  and  is  uncorrelated, 
n  w 

PSD  of  y,  called  in  Z  transform  notation,  is  given  by 


The 


"  AiZ^ A*(\/Z)  *  % 


(143) 


This  expression  should  be  clear  from  the  previous  discussions.  The  At  here 
simply  gives  one  a  different  dimensional  form  that  is  sometimes  useful,  and  it  is 
the  form  Kay  and  Marple  used.  This  can  be  rewritten  as 


28.  Kay,  S.  M, ,  and  Marple,  S,  L.  (1981)  Spectrum  analysis —a  modern  perspec¬ 
tive,  Proc,  IEEE  69,  pp.  1380-1419, 


a  +  c  (A(Z)  A*(l/Z))>  At 


Figure  20  (taken  from  Kay  and  Marple)  shows  the  result  of  an  experiment 
which  involves  two  values  of  added  noise.  When  the  Signal  to  noise  ratio  is  high, 
the  lines  are  very  clearly  resolved  (left  side).  Lowering  the  signal  to  noise  ratio 
(SNR)  from  25  db  to  5  db  washes  out  all  the  good  resolution.  One  concludes  that 
"noise  is  very  bad  for  MEM  spectral  analysis." 


Figure  20.  Effects  of  Noise  on  a  MEM  Spectrum 
(after  Kay  and  Marple^®) 
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At  this  point  it  is  appropriate  to  quote  two  passages  from  Jaynes  on  this 
subject.  He  says  that  MEM  is  "...  a  precision,  high-performance  machine  , . . 
and  can  deliver  that  high  performance  only  when  fed  high  quality  fuel".  By  this 
he  means  that  one  must  feed  it  exact  autocorrelations,  or  rather  they  must  be 
as  noise  free  as  possible.  A  little  noise  can  push  the  poles  further  away  from 
the  unit  circle.  He  goes  on  to  say  that  MEM  "fails  to  take  noise  into  account,  a 
factor  that  orthodox  methods  do  deal  with  usefully,  and  sometimes  ever  optimally" 
He  then  called  for  a  "full  Bayesian  Solution"  and  went  on  to  point  the  way  for  one 

to  proceed. 
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Abels  in  a  review  article  mentioned  an  interesting  idea.  He  suggested 
maximizing  entropy 


(145) 


29.  Abels,  J.  G.  U974)  Maximum  entropy  spectral  analysis.  Reprinted  in 

Childers,®” 

30.  Childers,  D. G.  (1978)  Modern  Spectrum  Analysis,  IEEE  Press. 


but  with  an  alteration  of  the  constraint  to  the  following  form 


■ 
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P((j)  e  du  - 


2 

a 


(146) 


The  difference  between  this  form  and  the  conventional  one  of  MEM  is  that  here 
2  2 

one  has  a  instead  of  zero  and  the  a  is  taken  to  represent  the  uncertainty  intro¬ 
duced  by  the  noise.  The  above  makes  use  of  Abel’s  continuous  notation,  however, 
the  reader  is  reminded  that  (a)  implementation  requires  a  discrete  formulation 
and  (b)  Eq.  (145)  is  not  valid  in  the  continuous  case  as  was  pointed  out  by  Jaynes 
and  as  has  already  been  mentioned. 

Of  course,  if  one  must  cope  with  noise,  one  can  always  increase  the  order 
of  A(Z)  to  compensate  for  the  change  in  the  signal  model  brought  about  by  that 
noise. 


9.3  The  Limiting  Cases  Test 

Finally,  we  consider  case  (3)  which  was  first  suggested  to  the  present  writer 
by  George  Vanasse.  Figure  21  illustrates  this  case.  Qualitatively  the  argument 
goes  as  follows.  If  one  is  given  a  "short"  autocorrelation  in  the  sense  that  it  looks 
as  if  it  would  "go  on"  in  a  similar  fashion,  then  it  would  appear  from  Figure  21a 
that  one  has  a  good  candidate  for  MEM  (forgetting  for  the  moment  the  problems 
already  discussed  in  cases  (1)  and  (2).  In  contrast,  suppose  one  has  a  case  such 
as  that  given  in  Figure  21b.  The  latter  represents  an  autocorrelation  that  slowly 
"dies"  and  which,  so  far  as  one  can  know,  looks  as  if  it  has  permanently  decayed. 
As  has  been  mentioned,  MEM  linearly  predicts  the  autocorrelation  to  cover  all 
values  of  time.  What  sort  of  prediction  would  one  expect  in  the  predicted  exten¬ 
sions  of  this  autocorrelation?  From  the  physics  of  the  linear  prediction  process 
as  shown  in  Figure  18.  it  is  clear  that  the  predictor  would  predict  zeroes.  In 
other  words  MEM  cannot  give  something  for  nothing. 

We  conclude  this  section  with  Figure  22  which  shows  what  happens  when  one 
performs  various  types  of  analysis.  "MEM  via  Burg  Algorithm"  will  be  described 
in  the  next  section.  "Yule -Walker"  or  YW  refers  to  obtaining  the  coefficients 
from  the  autocorrelations  in  the  manner  described  above.  The  figure  is  from 
Kay  and  Marple, 
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•  ■  -  J 


PSD  >  PSD 


Figure  22.  Various  PSDs  and  the  True  PSD  After  Kay  and  Marple 


10.  THF,  BURG  TECHNIQUE 

The  MEM  method  for  obtaining  PSD's  was  of  course  invented  by  Burg.  How¬ 
ever,  he  is  also  the  inventor  of  the  "Burg  Technique"  which  is  used  to  obtain  the 
a^  coefficients.  It  is  important  to  make  a  distinction  between  these  two  contribu¬ 
tion?  of  Burg  to  avoid  confusion. 

Burg  was  very  interested  in  the  fact  that  the  series  of  reflection  coefficients 
(together  with  the  zero  lag  autocorrelation)  represents  a  new  way  to  embody  the 
"second  order  statistics"  of  a  time  series.  He  thought  that  it  could  be  more 
fundamental  than  the  usual  methods,  namely  the  PSD  and  the  autocorrelation.  In 
addition,  his  choice  of  the  word  "reflection  coefficient"  for  the  C^^'s  rested  on  the 
fact  that,  in  the  analysis  of  seismological  signals  where  acoustic  waves  pass 
through  many  layers  of  earth,  the  resulting  Cj^'s  are  indeed  the  reflection  coeffi¬ 
cients  of  these  layers.  In  fact,  the  "Burg  technique"  can  be  regarded  as  a  "wave 


propagation  model"  where  the  forward  and  backward  predictors  play  the  roles  of 
upward  and  downward  going  waves  (Claerbout,  p.  136). 

Our  motives  for  presenting  the  "Burg  technique"  are  (1)  it  is  the  one  most 
often  used  in  MEM-PSD  analysis,  and  (2)  we  used  it  to  obtain  our  experimental 
results  to  be  described  in  Section  11.  It  should  be  mentioned  that,  in  the  context 
of  Fourier  spectroscopy,  the  Yule-Walker  method,  which  relies  on  the  auto¬ 
correlations  themselves,  would  seem  to  be  the  ideal  method  at  first  sight.  This 
is  of  course  due  to  the  fact  that  the  autocorrelations  themselves  are  the  raw  data 
of  the  instrument.  As  one  will  see  in  Section  11.  there  are  important  exceptions 
to  this  rule.  In  the  following  we  shall  use  Burg's  own  description  of  his  method, 
but  there  are  many  other  treatments  to  be  found  in  the  literature  (such  as 
Robinson  and  Treitel,  Claerbout^  and  Haykin^^). 

Before  we  consider  the  mathematical  details  (which  can  be  omitted  by  the 
reader  not  interested)  it  is  of  interest  to  list,  qualitatively,  the  most  important 
features  of  this  method. 

1.  The  Burg  technique  obtains  the  C^^^'s  (as  well  as  the  a^'s)  directly  from 
raw  time  series  data. 

2.  This  method  completely  avoids  the  problem  which  arises  when  one  calcu¬ 
lates  an  autocorrelation  from  a  finite  piece  of  data.  The  reader  will  recall  that 
the  autocorrelation  is  calculated  from 

q  =  N-r 

X  (147! 

r  N  -  r  q  q+r 

q  =  0 


if  one  follows  Blackman  and  Tukey.  It  is  clear  from  this  that  when  r  gets  close 
to  N,  the  overlap  becomes  small  and  serious  difficulties  can  be  expected.  This 
problem  is  particularly  acute  when  one  has  a  short  data  sample  or  a  collection  of 
such  samples.  This  feature  implies  that,  when  one  has  a  number  of  short  pieces 
of  data,  then  the  MEM-PSD  method,  together  with  the  Burg  technique,  is  the  best 
available  route  to  the  PSD. 

3.  This  method  incorporates  the  Levinson  Recursion  in  its  machinery. 

4.  The  mean  square  error  of  both  the  forward  predictor  and  backward  pre¬ 
dictor  is  minimized,  or  rather  it  is  the  average  of  both  that  is  minimized. 

5.  The  stability  of  the  PE  filter  is  guaranteed.  This  is  not  true  for  the 
Yule-Walker  case. 

6.  Given  raw  data,  this  method  would  provide  a  much  higher  resolution  for 
the  PSD  than  would  the  Yule-Walker  (YW)  case. 

We  now  consider  the  details.  The  best  method  to  describe  this  recursive 
procedure  is  to  consider  the  case  where  one  has  at  the  start  the  N-lth  PE  filter 


(148) 


L  I 


A  -  1  4-  o<N-l)  7  ,  (N-1)  _2  ,  .^(N-D^N-l 

Aj^.l(Z)  -  1  +  ai  Z  +  ag  Z  +...+aj^_^  Z  (148) 

and  where  one  seeks  Aj^(Z).  We  know  from  Levinson  Recursion  that 

A  I7\  -^  +  /l(N-l)  ,  „  ,(N-1)\7  .  .  /  (N-1)  ,  „  ^(N-DX-N+I  ,  p  -N 

A^(Z)-l+^a^  +  jZ+  ...+ jZ  +  Cj^  Z  . 


where  Cj^  is  still  unknown.  It  can  be  obtained  from  the  data  in  the  following  way. 

Suppose  that  the  data  consist  of  a  number  of  short  samples,  for  example 
(Xp,  Xg.  • .  -Xjo),  ^’^100'  ’^101’  ■  ■  ■  ■  ’^llO^  assume  that  one  has 

a  total  of  M  sets  of  these  N+1  tuplets  (where  N  =  10  for  the  two  examples).  The 
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key  calculation  is  to  minimize  the  mean  square  error  e  ,  given  by 

=  (A(Z)X(Z)i2  .  (150) 

The  value  of  this  quantity  over  all  the  pieces  of  data  is  given  by  (m  refers  to 
the  m^^  (N+1)  -•  tuplet,  (Xj.  m,  Xg,  m,  ....  m)  where  m  =  1  to  M. ) 

_  M 

[^N  ^  m  ^  ^  af"!))  Xg^  ^  +  . .  . 

m  =  l  ^ 

■j  I  2 

+  (a<N-^’  +  C  +x  (151) 

\  1  “^N-l  /’^N.  m  ^ ’^N-l,  mj  j  ’ 

where,  in  Eq.  (150),  we  used  the  convolution  theorem  and 


^  W  =  1,  W  SO 
/  j  m  m 


We  now  define 


®m  =  ^N-l’'2.m^-”  -"^l^N.m  +  ^N+l.m 


‘’m  °  *1,  m  ^1  *2,  m-  ^N-1  *N. ; 
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where  e^  and  are  the  forward  and  backward  error  values  respectively. 
Now,  it  can  be  seen  that  Eq.  (151)  can  be  written 


W  e  +  b 
m'  m  N  m 


but  it  is  equally  valid  to  estimate  e  from 


W  b  +  C.,  e 
m'  m  N  m' 


and  for  this  reason  Burg  realized  that  a  better  estimator  would  be  obtained  if  the 
average  of  Eqs.  (153)  and  (154)  were  used  in  place  of  either  one  alone.  Thus  he 
used 


=  W  [le  +C^b 

2  /  .«  m  L'  m  N 


•  +  (b  +  e  1^1 

m'  '  m  N  m'  J 


To  obtain  Cj^,  one  minimizes  in  Eq.  (155)  by  setting  its  derivative  with  respect 
to  equal  to  zero.  The  result  is,  simply. 


(-2)>  W„(b„e„) 
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2  1,2  1 
e  --  b 
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As  the  references  show,  the  denominator  will  always  be  greater  than  the  numera¬ 
tor,  and  it  is  this  fact  which  guarantees  stability  (jc|  £  1).  As  Claerbout,  ^ 

p.  136,  asserts,  the  b  and  e__  vectors  play  the  role  of  "upward  and  downward 
mm 

going  waves"  in  the  seismic  model. 


11.  EXPERIMENTAL  RESULTS 


From  the  AFGL  Fourier  spectrometer,  George  Vanasse  provided  some  data 
in  the  form  of  an  autocorrelation  for  an  experiment  to  test  the  MEM  approach 
on  such  data.  Neil  Grossbard  provided  the  software  and  suggested  the  approach 
that  is  used  here. 

Originally  the  autocorrelations  contained  132,  000  points.  We  analyzed 
8,  200  points  and,  as  will  be  seen,  the  results  were  excellent.  On  the  other  hand, 
we  did  not  analyze  the  original  data,  and,  in  that  sense  the  test  is  incomplete. 
There  is  a  problem  in  the  application  of  MEM-PSD  analysis  to  data  which  contain 
many  lines.  The  132,  000  point  autocorrelations  represented  between  3  to  4 
thousand  spectral  lines.  The  computer  we  used  had  a  core  memory  of  about 
10.  000  numbers.  But,  as  we  have  seen,  to  have  a  single  complex  sinuisoid 

lu  t 

c  °  ,  one  pole  was  needed;  and  hence,  to  have  a  real  spectral  line,  one  would 
need,  as  a  lower  bound,  more  than  two  reflection  coefficients.  This  limitation  on 
MEM  is  the  key  factor  in  its  application  to  optics.  At  least  initially,  we  wished 
to  avoid  the  problem  of  an  exceedingly  long  computer  run  (which  would  be  needed 
if  many  more  points  than  could  be  fit  into  core  were  employed). 

To  circumvent  the  difficulty,  the  original  data  were  pre-filtered  by 
Mark  Esplin  in  a  manner  which  necessitated  the  use  of  a  conventional  Fourier 
spectral  analysis  on  the  original  data  as  well  as  an  inverse  transform.  In  this 
way  we  were  able  to  generate  an  interferogram  that  corresponds  to  a  spectrum  of 
about  100  lines,  however,  the  test  must  be  considered  preliminary  until  actual 
data  are  used.  Because  the  results  of  the  test  were  good,  we  plan  to  carry  out 
the  procedure  on  such  actual  data. 

Another  problem  arose  from  the  fact  that  the  autocorrelations  with  which  we 
began  were  not  positive  definite  (in  the  sense  that  their  Toeplitz  Matrix  did  not 
satisfy  that  condition).  To  cope  with  this  problem,  we  did  not  use  the  Yule- 
Walker  technique  on  the  autocorrelations,  but  instead  did  something  suggested  by 
N.  Grossbard.  We  applied  the  "Burg  Technique"  directly  upon  the  symmetrical 
autocorrelation.  In  this  manner,  the  original  problem  did  not  give  rise  to  insta¬ 
bility  because,  as  was  shown,  the  Burg  Technique  guarantees  stability. 


The  PSD  was  obtained  using 
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and  then  calculating 


PSD(u)  -  (Psurg^^^  ■ 


(157) 


(158) 


Equation  (157)  was  obtained  using  the  Burg  method,  as  mentioned,  and  Eq.  (158) 
was  reported  as  the  "PSD".  The  square  root  was  inserted  due  to  the  fact  that  if 
one  were  to  use  conventional  analysis,  the  Fourier  Transform  of  the  autocorrela¬ 
tion  of  an  autocorrelation  would  be  the  square  of  the  PSD.  Since  this  is  not 
exactly  true  in  our  application,  Eq.  (158)  is  not  really  proven  to  be  mathematically 
correct.  The  fact  that  it  gave  good  results  is  its  only  real  justification.  The 
factor  ;:^x  in  Eq.  (158)  was  inserted  for  dimensional  reasons  (recall  that  Ax  or  At 
is  frequently  set  =  1), 

The  results  are  given  in  Figure  23.  While  the  abscissa  is  unlabeled,  it  is 
identical  to  the  one  in  Figure  24  which  gives  the  132,  000  point  result  obtained  by 
conventional  means.  Notice  that  in  Figure  23,  the  line  pairs  in  Figure  24  labeled 
6,  IR,  18  and  19  were  not  resolved. 

Next,  in  the  spectrum  of  Figure  23,  a  Newton-Raphson  technique  was  used  to 
search  for  the  line  peaks.  Table  1  gives  the  wavenumber  (cm  ^)  locations  found 
in  this  way  and  compares  these  to  the  locations  in  Figure  24.  Considering  the 
reduction  from  132,  000  points  to  8,  200  points,  this  result  is  very  encouraging  in 
spite  of  the  previously  mentioned  prefiltering. 

Finally,  we  estimated  a  quantity  that  was  proportional  to  the  power  of  these 
lines  by  integrating  under  each  peak.  To  do  so,  we  filled  in  the  PSD  estimates 
between  the  half-power  points  (or  local  minima)  and  numerically  integrated. 

Figure  25  shows  a  graph  of  these  results.  The  ordinate's  units  were  arbitrary, 

A  comparison  between  Figure  24  and  Figure  25  shows  that  the  two  families  of 
lines  (large  and  small  peaks)  are  clearly  displayed. 


^Probably  PSD(u)  =  (Pg(u))Ax)^/^ 


is  more  accurate. 


WAVE  NUMBER 


Figure  23.  MEM  Analysis  of  Filtered  Data  of  G.  Vanasse.  A  small 
fraction  of  the  data  was  used. 


These  results  show  us  that,  there  is  excellent  agreement  between  spectral 
line  positions  calculated  by  Blackman -Tukey  with  a  large  number  of  points  and 
MEM  applied  to  a  much  shorter  interferogram.  On  the  other  hand,  the  discrepancy 
in  line  intensities  requires  more  investigation. 

Any  problem  with  core  memory  is  a  hardware  problem.  If  the  PSD  in  ques¬ 
tion  has  too  many  lines  (at  too  high  a  density  to  be  filtered  by  analog  means)  then 
the  solution  is  simply  to  wait  until  the  hardware  becomes  available,  or  else  make 
more  efficient  use  of  existing  core  capacities.  On  the  other  hand,  if  one  has  an 
insufficient  number  of  points  to  calculate  a  sufficient  number  of  PE  coefficients 
(or  reflection  coefficients)  in  relation  to  the  number  of  actual  spectral  lines 
present,  then  the  MEM  procedure  will  fail. 


224  00  226  00  228  00  23000  232  00  234  00 

WAVENUMBER  (CM"') 

Figure  24.  Original  FFT  Analysis  on  Entire  Data  Sample. 
The  numbers  refer  to  the  same  lines  in  Figures  23—25 


Figure  25.  Graph  of  Integrated  Lines  Derived  From  MEM  F*SD  of 
Figure  23.  There  is  excellent  agreement  with  the  positions  of  the 
lines  in  Figure  24,  but  the  intensities  disagree  by  a  large  amount. 
Ordinate  scale  is  arbitrary 


Table  1.  Comparison  of  Line  Positions  From  Fourier 
Transform  and  Burg  MEM 


Wave  No.  of 
Fourier  Transform 


224.1830 

224.  1773 

224. 9531 

224. 9395 

225. 1736 

225.  1774 

225. 5168 

225.  5104 

226.2935  1 

226.3258 

226.3266 

226.  8234 

226. 8076 

227.4541 

227. 4449 

227. 6106 

227. 6134 

228. 1040 

228.0849 

228. 5585 

228. 5522 

228.  9092 

228. 8937 

229.3574 

229.3486 

229. 6374 

229.  6270 

230. 1739 

230. 1893 

230. 5861 

230.  6805 

230. 6920 

231.4190 

231.4182 

231.  7232 

231. 7356 

231.7877 

232.  6393 

232.  7046 

232. 7301 

1 

232.9632 

232.9711 

233.7132 

233.7046 

234.1133  1 

233.902 

12.  REFLECTION  COEFFICIENTS  -  THE  WAVE  OF  THE  FUTURE 


Historically,  it  was  the  PSD  (via  the  prism  or  grating)  of  electromagnetic 
waves  which  eventually  led  to  the  creation  of  Quantum  Mechanics.  It  was  the 
spectral  line  of  atomic  radiation  that  led  to  the  Bohr  Atom,  The  PSD  has  also 
played  an  essential  role  in  time  series  analysis,  more  so  than  its  counterpart,  the 
autocorrelation.  The  latter,  via  the  Wiener  Khintchine  Theorem,  contains  the 
same  "information”,  but  it  is  mainly  used  as  a  way  to  get  the  PSD  (as  we  have 
seen). 

As  has  already  been  mentioned.  Burg  has  given  us  yet  another  fundamental 
way  to  represent  the  "second  order  statistics"  contained  in  the  PSD  and  auto¬ 
correlation.  This  new  representation  consists  of  the  zeroth  lag  autocorre¬ 
lation.  plus  the  reflection  coefficients,  or  (<(iq,  Cg . ^0 

corresponds  to  the  scale  of  the  PSD,  and  the  C^^'s  determine  its  shape.  Further¬ 
more,  Burg  has  suggested  that  this  new  representation  may  be  even  more  funda¬ 
mental  than  the  PSD.  In  the  opinion  of  the  present  author,  this  idea  is  worthy  of 
consideration. 

Burg  shows  that  there  is  a  one-to-one  correspondence  between  the  autocor¬ 
relation  and  (<Jq,  C^,  ...  Cp^).  To  see  this,  recall  the  recursion  relations,  (111), 


N-1 

^  '*’N-n^n 
n=0 


C 


N 


(159) 


(160) 


(161) 


Burg  showed,  incidentally,  that  whenever  Ic^^l  =  1,  the  recursion  terminates,  the 
poles  jump  to  the  unit  circle,  and  the  resulting  time  series  becomes  deterministic. 
(The  last  is  due  to  the  fact  that  the  prediction  error  becomes  zero  as  shown  by 
Eq.  (161.) 

Combining  Eqs.  (159)  and  (160),  we  obtain 


N-1 


'N 


6  -  C  P 

^N-n  ^N- 


(162) 


Parenthetically,  it  should  be  pointed  out  that  Eq.  (162)  shows  how  to  extend  the 
autocorrelation;  and  Burg  has  proven  that  the  MEM-PSD  could  be  obtained  by 
conventional  means  (Wiener-Khintchine  Theorem)  from  these  autocorrelations 
by  using  Eq.  (162)  to  extend  them  to  ±oo.  In  any  case,  Eq.  (162)  shows  that  it  is 
possible  to  put  the  into  one-to-one  correspondence  with  the  given  auto¬ 

correlations.  The  recursion,  it  may  be  shown,  can  be  run  in  both  directions  in 
the  parameter  N.  More  details  are  given  in  Burg's  thesis. 

There  is  a  similar  one-to-one  correspondence  between  the  PSD  and  the  0^, 
Cj . Cjj  representation.  The  PSD  is 


as  is  now  very  familiar.  The  Aj^(Z)  can  be  shown  to  satisfy 


Aj^^.l(Z)  =  Aj^(Z)  +  Z^  Aj^(l/Z) 


Eq.  (164)  is  derived  from 


together  with  the  recursion  of  Levinson 


(N+1)  .  (N) 

^0  ' 


(N+l)  .  (N)  ^  „  (N) 


(N+1)  .  (N)  ^  (N) 


C  a' 


(Compare  Eq,  (111)  and  see  p.  166  of  Robinson  and  Treitel^^,)  Using  Z  =  e*'**^, 
in  Eq,  (164),  we  obtain 


Ajj^j(w)  =  Apj(u.)  +  ^ 


which  is  the  frequency  domain  representation  of  Levinson  Recursion.  This  re¬ 
lates  the  PSD  to  the  Cj^,  .  .  .  Cj^)  through  Eq.  (163). 

We  now  come  to  an  interesting  question.  "When  is  the  (j)^,  C^,  ...  Cj^ 
representation  superior  in  some  way  to  the  PSD  or  to  the  autocorrelation?"  One 
application  is  prospecting  for  oil  by  analyzing  acoustic  signals  (due  for  example  to 
an  explosion)  which  contain  information  about  the  location  of  the  layers  under  the 
ground.  Figure  26  shows  how  a  signal  pulse  can  give  the  location  of  the  layers 
but  how  reverberations  can  make  the  signal  very  complex.  The  important  point 
is  that,  in  the  present  examples,  the  CVs  contain  the  physics  of  the  situation  in 
the  most  practical  manner  of  speaking.  Also,  as  has  been  mentioned,  it  was  this 
particular  field  of  research  that  led  Burg  to  his  "technique"  as  well  as  to  his  MEM 
PSD. 

There  is  another  place  where  the  reflection  coefficients  represent  the  physics 
behind  the  signal.  This  is  in  the  field  of  speech  compression.  For  reasons  of 
economics  it  is  very  important  to  find  ways  to  represent  speech  signals  concisely. 
The  signal  is  therefore  "parameterized"  and  these  numbers  are  transmitted.  An 
earlier  method  consisted  of  sending  the  pitch  (fundamental  frequency),  together 
with  linear  prediction  coefficients.  The  sounds  were  regenerated  at  the  other  end, 
(A  "voiced"  vs  "non-voiced"  parameter  was  included).  Later  on  it  was  found  that 
there  were  many  "round-off"  problems  that  could  be  solved  if  the  linear  prediction 
coefficients  were  replaced  by  reflection  coefficients.  This  was  pointed  out  to  the 
present  author  by  Caldwell  Smith  of  RADC,  The  explanation  probably  has  a  rela¬ 
tion  to  the  following  interesting  fact.  Figure  27  represents  an  "acoustic  tube" 

model  for  the  vocal  tract.  This  model  plays  a  fundamental  role  in  the  represents - 

22 

tion  of  speech  sounds  (see  Robinson  and  Silvia  ).  In  this  case,  the  reflection 
coefficients  correspond  to  the  size  of  the  cross-sections  of  the  "pieces  of  tube"  in 
the  model.  Again  the  Cj^'s  have  a  physical  significance. 

Let  us  now  consider  the  question  of  the  future  role  of  the  ((/)p,  C^.  . . .  C^j) 
representation  in  Fourier  Spectroscopy.  The  question  reduces  to  the  following. 
Since  Fourier  Spectroscopy  is  mainly  used  to  identify  molecular  and  atomic 
species,  does  the  new  representation  in  some  way  present  us  with  a  more  econom¬ 
ical  way  to  do  the  job?  To  a  fertile  mind,  such  a  question  may  present  many  ram¬ 
ifications.  For  example,  would  it  not  be  interesting  if  a  PSD  could  be  "decon¬ 
volved"  in  a  manner  to  render  species  identification  more  simple? 


According  to  Robinson  et  al,  prospecting  since  the  1960's  would  not  have  been 
possible  without  data  analysis.  This  is  due  to  the  fact  that  since  that  time,  the 
nature  of  the  locations  of  the  oil  were  such  that  it  was  necessary  to  apply 
"deconvolution"  techniques  and  vast  amounts  digital  signal  processing  (a  trillion 
bits  a  day  at  present)  merely  to  find  the  layers. 


Figure  26a.  First  Reflection  From  Layers 


Figure  26b.  Reverberation.  All  combinations  of  reflection  and  reverberation 
occur,  which  make  the  initial  pulse  6  result  in  a  complicated  signal 


Figure  27.  A  Diagram  That  Depicts 
an  "  Acoustic  Tube"  Model,  Models 
such  as  this  one  are  used  to  analyze 
speech  signals.  The  reflection  co¬ 
efficients  are  the  areas  of  the  tube 
sections 


If  it  should  turn  out  that  the  new  representation  were  to  present  us  with  a  way 
to  realize  significantly  great  simplifications,  then  this  in  turn  would  suggest  that 
the  reflection  coefficients  may  have  a  physical  significance  in  quantum  mechanics. 
If  anything,  at  least  it  is  an  interesting  idea. 

13.  CONCLUSION 

The  relatively  recent  method  of  MEM-PSD  analysis  promises  to  aid  the 
technology  of  Fourier  Spectroscopy.  Its  main  advantage  is  that  under  appropriate 
conditions,  it  permits  a  drastic  reduction  in  the  number  of  points  of  the  initial 
interferogram  without  significant  loss  of  resolution.  In  addition,  it  may  lead  to 
still  further  developments  in  spectral  analysis  due  to  the  entirely  new  conceptual 
basis  and  philosophy  which  underlies  the  mathematics  of  the  technique. 
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